Does the educational management model matter? New evidence from a quasiexperimental approach


A growing literature has appeared in the last 2 decades with the aim to explore if the way in which publicly funded private schools are managed (a very autonomous mode) is more effective, than that applied in public schools (where decisions are highly centralized), concerning the promotion of student’s educational skills. Our paper contributes to this literature providing new evidence from the Spanish experience. To this end, we use the Spanish Assessment named “Evaluación de Diagnóstico,” a national yearly standardized test given to students in the fourth grade and administered by the Regional Educational Authorities. In particular, our data are those corresponding to the assessment conducted in the Spanish region of Aragón in 2010. Our methodological strategy is defined by the sequential application of two methods: propensity score matching and hierarchical linear models. Additionally, the sensitivity of our estimates is also tested with respect to unobserved heterogeneity. Our results underline the existence of a slight advantage of the private management model of schools in the promotion of scientific abilities of students and in the acquisition of foreign language (English) skills.

This is a preview of subscription content, log in to check access.

Fig. 1


  1. 1.

    Recent reviews of these educational policies can be found in the Handbook of the Economics of Education (Bettinger 2011; Epple et al. 2016; Urquiola 2016).

  2. 2.

    For the sake of simplicity, hereinafter we will refer to the publicly funded privately run schools simply as private schools, except when necessary to differentiate from completely private independent schools. Namely, in this paper a private school is understood as a type of publicly funded but self-governed school. Similarly, we will refer to state funded and run schools as public schools. The main difference between them lies mainly in the way they are administered, being private schools much more autonomous concerning the process and personnel decisions (deciding on the purchase of supplies and on budget allocations within schools, hiring and rewarding teachers, choosing textbooks, instructional methods, and the like).

  3. 3.

    This bias has its origin in the fact that attendance at a school, whether private or public, is not random but instead is conditioned by characteristics of the family background, which in turn are extremely important in the determination of educational outcomes (the family socioeconomic level, for example).

  4. 4.

    In this sense, our paper meets Davies’ claim (2013, p. 880): “As debates over school choice become increasingly transnational, we need studies from a variety of settings to build a stockpile of international knowledge about school sectors and student achievement.”

  5. 5.

    The average score of each competence for the total number of schools is 500 and the standard deviation 100, given that as established by the General Report on Diagnostic Evaluation in Aragón 2010 “the evaluation of each competence in Aragón as a whole is established at the level of the average scores transformed into a reference value which has been fixed at 500, with a standard deviation of 100.” Here, the approach of the Spanish Diagnostic Evaluation is similar to that of the evaluations of the PISA Project of the OECD. In Table 1, the average score differs from 500 due to the elimination from the sample of completely private independent schools and of those situated in municipalities in which there exists no choice between public and private schools.

  6. 6.

    Escardibul and Villarroya (2009) and Mancebón and Ximénez-de-Embún (2014) offer some suggestions to cope with these inequalities in the distribution of students between public and private schools.

  7. 7.

    The key point is that these characteristics are also chief determinants of educational outcomes. This circumstance is the cause underlying the self-selection bias problem threatening our estimates. Selection bias and/or endogeneity are widespread in educational research and is the main methodological problem encountered when trying to evaluate the effect of private schools on the academic performance of children (Lefebvre et al. 2011). This is a methodological problem inherent in all impact evaluations in non-experimental studies (such as 2010 Aragón ED).

  8. 8.

    The most common estimands in non-experimental studies are the “average effect of the treatment on the treated” (ATT), which is the effect for those in the treatment group, and the “average treatment effect” (ATE), which is the effect on all individuals (treatment and control). Our focus of interest is to measure the expected effect on the outcome if individuals in the population were randomly assigned to treatment being this what is exactly captured by the ATE (Austin 2011). This parameter allows us to know what the performance of the Spanish students would be if they attended a self-governing private school.

  9. 9.

    Such as Imbens (2004, p. 11) states when he refers to the combination of methods to estimate ATE “The motivation for these combinations is that although in principle any one of these methods can remove all of the bias associated with the covariates, combining two may lead to more robust inference. For example, matching leads to consistent estimators for average treatment effects under weak conditions, so matching and regression can combine some of the desirable variance properties of regression with the consistency of matching.”

  10. 10.

    The assumption of selection on observables requires that conditional on the observed variables, the assignment to treatment is random.

  11. 11.

    We are assuming homogeneity in response across observed covariates. Lehrer and Kordas (2013) demonstrate that when the treatment effects vary in an unsystematic manner with the true propensity score, there are gains from using a matching algorithm based on propensity scores estimated via binary regression quantiles.

  12. 12.

    In the estimation of the propensity score, only those variables that could affect both the choice of a private school and the students’ academic performance were included (no consideration is taken of either the variables which can potentially contribute to explaining the differences in educational outcomes but which do not influence the choice of school, such as study habits, for example, nor those which could be determinants of that choice but do not influence the educational skills cited, such as the distance to the school, for example). In addition, only those variables which are potential predictors of educational outcomes and which occur prior to the choice of school (or were stable between the time of the choice of school and the time of the outcome assessment) were included as explanatory variables in equation 1 (Caliendo and Kopeinig 2008). Material that point out all the observables are listed in Table 1 and case-wise deletion was used to handle missing data.

  13. 13.

    The first of these (NNM) matches each treated individual with that non-treated individual having the most similar propensity score value. This is to say, in nearest neighbour matching, Stata selects the control(s) nearest to each treated observation for comparison. KM constructs matches using all the individuals in the potential control sample in such a way that it gathers more information from those who are closer matches and less from distant observations. In so doing, KM uses comparatively more information than other matching algorithms (Guo and Fraser 2010, chapter 7).

  14. 14.

    We applied the coarsened exact matching as a robustness technique obtaining worse results in terms of similarity between treatment and control groups generated. Results are available upon request.

  15. 15.

    Results supplied by the different matching estimation methods led to similar conclusions. They are not supplied here but are available from the authors upon request.

  16. 16.

    Other results devoted to test the matching quality are shown in Appendix. In particular, Table 5 shows the differences in the average values of propensity scores and covariates for the whole sample and the paired sample. The last two rows in this table show the median absolute standardized bias (Rosenbaum and Rubin 1985) before and after matching. As can be inferred, KM has reduced covariate imbalance on all variables. Figure 2 shows graphically the pre- and post-matching bias for each of the variables included in the estimation of the propensity score. Figure 3 depicts the distribution of these same variables by type of school for the complete sample (figures on the left) and the matched sample (figures on the right).

  17. 17.

    This command calculates the ATE as a weighted average of the ATT (average effect on treated) and the ATU (average effect on untreated). This is a very common definition of the ATE in the literature (see for instance, Böckerman et al. 2013; Gangl 2014). An alternative way to calculate the ATE is by weighting observations by the inverse of the calculated propensities scores (Hirano et al. 2003). In order to check the robustness of the ATE, we also calculated it applying this last method, i.e., using the propensities as sampling weights. For this, we used the Stata’s teffects module. Results are similar to those shown in Table 3 and are available upon request.

  18. 18.

    For a mathematical demonstration, see DiPrete and Gangl (2004).

  19. 19.

    In addition, we calculated the Hodges–Lehmann point estimates and its confidence intervals obtaining the same critical values.

  20. 20.

    Additionally, we test our estimation with another sensitivity analysis proposed by Ichino et al. (2008) This consists in calculating the ATE under different possible scenarios of deviation of conditional independence assumption (CIA). To do so, the authors impose values to parameters that characterize the U distribution in order to simulate the ability to generate bias in the unobservable and recalculate the parameter value with the inclusion of the influence of simulated unobserved variable. Results are available upon request. This approach has been widely used in the literature (Binder and Coad 2013; Caliendo and Künn 2015, among others). Others types of sensitivity analysis have been proposed in the literature. For example Altonji et al. (2005) applied a similar idea to the Heckman selection model.

  21. 21.

    HLM are similar to OLS concerning the way in which they weigh the observations (see Yitzhaki 1996 for a discussion of OLS weights). Both weigh the observations differently to PSM. We are grateful to an anonymous referee for making this point. In any case, our purpose with the HLM is not to compare the ATE that it supplies with that obtained via PSM.

  22. 22.

    Multilevel models, such as HLM, are built on the Moulton’s (1990) work of clustering. The insight provided by Moulton’s work was that when individuals within the aggregated level are clustered, so that they are in fact more similar to one another than individuals belonging to another cluster group, the OLS assumption that observations are independent and identically distributed is violated. For this reason, the estimation by OLS can result in a downward bias in the estimated standard errors of estimates leading the analyst to conclude the aggregate level effects are statistically significant when they are in fact not. Multilevel models have the benefit of allowing for partial pooling of coefficients towards the completely pooled OLS estimate which according to Gelman (2006) can be a more effective estimation strategy. Simulations using a dataset with students clustered within classrooms and classrooms within schools suggest that modelling the clustering of the data using a multilevel method is a better approach than clustering the standard errors of OLS estimate (Cheah 2009).

  23. 23.

    Bryk and Raudenbusch (1988) recommend the use of this type of general model when analysing the effects of schools on educational outcomes. There exist multiple applications of this methodology to the educational context. Among these are Willms (2006), Somers et al. (2004) and Mancebón et al. (2012), the last of these being applied to Spanish data from PISA 2006.

  24. 24.

    Previously to the estimation of the HLM, we evaluated the appropriateness of applying it to our data. For this, we calculated the intra-class correlation (ICC) values of the null model of science and foreign language (English) performance (the two being the dependent variables of the regression). If the ICC were zero, a hierarchical model would not be necessary, since in this case the total variance of the scores would not be explained by the differences existing between students attending different classes or schools. Results of these calculations for an HLM at two levels and three levels are offered in Appendix (Tables 6 and  7). These results (which show that the class level explains a small percentage of the variance of the results in foreign language (English), but does explain a higher percentage of the results in science) leads us to apply a two-level model for achievement in a foreign language (English) and a three-level model for science. At any rate, results for three-level model for English and two-level model for science lead to the same conclusions and are available upon request.


  1. Allen R, Vignoles A (2015) Can school competition improve standards? The case of faith schools in England. Empir Econ 50(3):959–973

    Article  Google Scholar 

  2. Altonji J, Elder T, Taber C (2005) Selection on observed and unobserved variables: assessing the effectiveness of Catholic schools. J Polit Econ 113(1):151–184

    Article  Google Scholar 

  3. Altonji J, Elder T, Taber C (2008) Using selection on observed variables to assess bias from unobservables when evaluating Swan–Ganz catheterization. Am Econ Rev 98(2):345–350

    Article  Google Scholar 

  4. Austin P (2011) An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res 46(3):399–424

    Article  Google Scholar 

  5. Böckerman P, Bryson A, Ilmakunnas P (2013) Does high involvement management lead to higher pay? J R Stat Soc Ser A Stat Soc 176(4):861–885

    Article  Google Scholar 

  6. Bernal J (2005) Parental choice, social class and market forces: the consequences of privatization of public services in education. J Educ Policy 20(6):779–792

    Article  Google Scholar 

  7. Bettinger E (2011) Educational vouchers in international contexts. In: Hanushek E, Machin S, Woessmann L (eds) Handbook of the economics of education, vol 4. Elsevier Science & Technology, North-Holland, pp 551–572

    Google Scholar 

  8. Binder M, Coad A (2013) Life satisfaction and self-employment: a matching approach. Small Bus Econ 40(4):1009–1033

    Article  Google Scholar 

  9. Bradley S, Migali G, Taylor J (2013) Funding, school specialisation and test scores: an evaluation of the specialist schools policy using matching models. J Hum Cap 7(1):76–106

    Article  Google Scholar 

  10. Bryk A, Raudenbusch S (1988) Toward a more appropriate conceptualization of research on school effects: a three-level hierarchical linear model. Am J Educ 97(1):65–108

    Article  Google Scholar 

  11. Caliendo M, Künn S (2015) Getting back into the labor market: the effects of start-up subsidies for unemployed females. J Popul Econ 28(4):1005–1043

    Article  Google Scholar 

  12. Caliendo M, Kopeinig S (2008) Some practical guidance for the implementation of propensity score matching. J Econ Surv 22(1):31–72

    Article  Google Scholar 

  13. Cheah B (2009) Clustering standard errors for modeling multilevel data. Technical report, Columbia University, New York

  14. Chowa G, Masa R, Wretman C, Ansong D (2013) The impact of household possessions on youth’s academic achievement in the Ghana Youthsave experiment: a propensity score analysis. Econ Educ Rev 33:69–81

    Article  Google Scholar 

  15. Chudgar A, Quin E (2012) Relationship between private schooling and achievement: results from rural and urban India. Econ Educ Rev 31(4):376–390

    Article  Google Scholar 

  16. Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Erlbaum Associates, Lawrence

    Google Scholar 

  17. Coleman J, Hoffer T, Kilgore S (1982) Secondary school achievement. Public, catholic and private schools compared. Basic Books, Inc. Publishers, New York

    Google Scholar 

  18. Crespo E, Santín D (2014) Does school ownership matter? An unbiased efficiency comparison for regions of Spain. J Prod Anal 41(1):153–172

    Article  Google Scholar 

  19. Davies S (2013) Are there Catholic school effects in Ontario, Canada? Eur Sociol Rev 29(4):871–883

    Article  Google Scholar 

  20. DiPrete T, Gangl M (2004) Assessing bias in the estimation of causal effects: Rosenbaum bounds on matching estimators and instrumental variables estimation with imperfect instruments. WZB discussion paper SP I 2004-101, Wissenschaftszentrum Berlin für Sozialforschung

  21. Doncel L, Sainz J, Sanz I (2012) An estimation of the advantage of charter over public schools. Kyklos 65(4):442–463

    Article  Google Scholar 

  22. Epple D, Romano R, Zimmer R (2016) Charter schools: a survey of research on their characteristics and effectiveness. In: Hanushek E, Machin S, Woessmann L (eds) Handbook of the economics of education, vol 5. Elsevier Science & Technology, North-Holland, pp 139–208

    Google Scholar 

  23. Escardibul JO, Villarroya A (2009) The inequalities in school choice in Spain in accordance to PISA data. J Educ Policy 24(6):673–695

    Article  Google Scholar 

  24. Gangl M (2014) Matching estimators for treatment effects. In: Best H, Wolf C (eds) The SAGE handbook of regression analysis and causal inference. SAGE Publications Ltd, London, pp 251–276

    Google Scholar 

  25. Gelman A (2006) Multilevel (hierarchical) modeling: what it can and cannot do. Technometrics 48(3):432–435

    Article  Google Scholar 

  26. Green C, Navarro-Paniagua M, Ximénez de Embún D, Mancebón M (2014) School choice and student wellbeing. Econ Educ Rev 38:139–150

    Article  Google Scholar 

  27. Gronberg T, Jansen D (2001) Navigating newly chartered waters. An analysis of charter school performance. Texas Public Policy Foundation, Austin

    Google Scholar 

  28. Guo S, Fraser M (2010) Propensity score analysis. Statistical methods and applications. SAGE publications Ltd., London

    Google Scholar 

  29. Hanushek E, Woessmann L (2014) Institutional structures of the education system and student achievement: a review of cross-country economic research. In: Strietholt R, Bos W, Gustafsson JE, Rosen M (eds) Educational policy evaluation through international comparative assessments. Waxmann, Munster

    Google Scholar 

  30. Hanushek E, Kain J, Rivkin S, Branch F (2007) Charter school quality and parental decision making with school choice. J Public Econ 91(5–6):823–848

    Article  Google Scholar 

  31. Herron M (1999) Postestimation uncertainty in limited dependent variable models. Polit Anal 8(1):83–98

    Article  Google Scholar 

  32. Hirano K, Imbens G, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4):1161–1189

    Article  Google Scholar 

  33. Ichino A, Mealli F, Nannicini T (2008) From temporary help jobs to permanent employment: what can we learn from matching estimators and their sensitivity? J Appl Econ 23(3):305–327

    Article  Google Scholar 

  34. Imbens GW (2004) Nonparametric estimation of average treatment effects under exogeneity: a review. Rev Econ Stat 86(1):4–29

    Article  Google Scholar 

  35. Kim Y (2011) Catholic schools or school quality? The effects of Catholic schools on labor market outcomes. Econ Educ Rev 30(3):546–558

    Article  Google Scholar 

  36. Lee BK, Lessler J, Stuart EA (2010) Improving propensity score weighting using machine learning. Stat Med 29(3):337–346

    Google Scholar 

  37. Lee M, Lee S (2009) Sensitivity analysis of job-training effects on reemployment for Korean women. Empir Econ 36(1):81–107

    Article  Google Scholar 

  38. Lefebvre P, Merrigan P, Verstraete M (2011) Public subsidies to private schools do make a difference for achievement in mathematics: longitudinal evidence from Canada. Econ Educ Rev 30(1):79–98

    Article  Google Scholar 

  39. Lehrer S, Kordas G (2013) Matching using semiparametric propensity scores. Empir Econ 44(1):13–45

    Article  Google Scholar 

  40. LODE (1985) Organic Law 8/1985, 3 July, Regulating education. Official Spanish State Bulletin 159

  41. Mancebón M, Ximénez-de-Embún DP (2014) Equality of school choice: a study applied to the Spanish region of Aragón. Educ Econ 22(1):90–111

    Article  Google Scholar 

  42. Mancebón M, Calero J, Choi A, Ximénez-de-Embún DP (2012) The efficiency of public and publicly-subsidized high schools in Spain. Evidence from PISA-2006. J Oper Res Soc 63(11):1516–1533

    Article  Google Scholar 

  43. McCaffrey D, Ridgeway G, Morral A (2004) Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods 9(44):403–425

    Article  Google Scholar 

  44. Moulton B (1990) An illustration of a pitfall in estimating the effects of aggregate variables on micro units. Rev Econ Stat 72(2):334–338

    Article  Google Scholar 

  45. Murname R, Willett J (2011) Methods matter. Oxford University Press, New York

    Google Scholar 

  46. Peel M (2014) Addressing unobserved endogeneity bias in accounting studies: control and sensitivity methods by variable type. Account Bus Res 44(5):545–571

    Article  Google Scholar 

  47. Rehm P (2005) Citizen support for the welfare state: determinants of preferences for income redistribution. WZB markets and political economy working paper SP II 2, Wissenschaftszentrum Berlin für Sozialforschung

  48. Rosenbaum P (2002) Observational studies. Springer, New York

    Google Scholar 

  49. Rosenbaum P, Rubin D (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55

    Article  Google Scholar 

  50. Rosenbaum P, Rubin D (1985) The bias due to incomplete matching. Biometrics 41(1):103–116

    Article  Google Scholar 

  51. Rubin D, Thomas N (2000) Combining propensity score matching with additional adjustments for prognostic covariates. J Am Stat Assoc 95(450):573–585

    Article  Google Scholar 

  52. Setoguchi S, Schneeweiss S, Brookhart MA, Glynn RJ, Cook EF (2008) Evaluating uses of data mining techniques in propensity score estimation: a simulation study. Pharmacoepidemiol Drug Saf 17(6):546–555

    Article  Google Scholar 

  53. Smith H, Tood P (2005) Does matching overcome LaLonde’s critique of non-experimental estimators? J Econ 125(1–2):305–353

    Article  Google Scholar 

  54. Somers M, McEwan P, Willms J (2004) How effective are private schools in Latin America? Comp Educ Rev 48(1):48–69

    Article  Google Scholar 

  55. Stuart E (2010) Matching methods for causal inference: a review and a look forward. Stat Sci 25(1):1–21

    Article  Google Scholar 

  56. Spanish Ministry of Education (2013) Spanish Education Statistics. Madrid. 2013.

  57. Urquiola M (2016) Competition among schools. Traditional public and private schools. In: Hanushek E, Machin S, Woessmann L (eds) Handbook of the economics of education, vol 5. Elsevier Science & Technology, North-Holland, pp 209–237

    Google Scholar 

  58. Willms J (2006) Learning divides: ten policy questions about the performance and equity of schools and schooling systems. UIS working paper 5. UNESCO Institute for Statistics, Montreal

  59. Yitzhaki S (1996) On using linear regressions in welfare economics. J Bus Econ Stat 14(4):478–486

    Google Scholar 

  60. Zimmer B R Gill, Booker K, Lavertu S, Witte J (2012) Examining charter student achievement effects across seven states. Econ Educ Rev 31(2):213–224

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to María Jesús Mancebón.

Additional information

The authors are grateful for the financial support received from the Spanish Government, Ministry of Economics and Competitiveness (Project EDU2013-42480-R). Mauro Mediavilla and Domingo P. Ximénez-de-Embún also acknowledge the support from Fundación Ramón Areces. We thank the editor, two anonymous referees and the associate editor for their helpful comments.



See Tables 5, 6, 7, 8 and Figs. 2 and 3.

Table 5 Average differences based on school type for the variables in the pre- and post-matching samples and bias reduction
Fig. 2

Pre- and post-matching bias between public versus private schools standardized

Fig. 3

Distribution of the variables in the unmatched and matched samples

Table 6 HLM regression: random effects (3-levels)
Table 7 HLM regression: random effects (2-levels)
Table 8 Estimation of fixed effects with robust standard errors in the HLM

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mancebón, M.J., Ximénez-de-Embún, D.P., Mediavilla, M. et al. Does the educational management model matter? New evidence from a quasiexperimental approach. Empir Econ 56, 107–135 (2019).

Download citation


  • School choice
  • Propensity score matching
  • Hierarchical linear models
  • Unobservable variables bias
  • Science and Foreign Language (English)  skills
  • Primary schools

JEL Classification

  • I21
  • I29