Skip to main content
Log in

Semiparametric estimation of conditional mean functions with missing data

Combining parametric moments with matching

  • Original Paper
  • Published:
Empirical Economics Aims and scope Submit manuscript

Abstract

A new semiparametric estimator for estimating conditional expectation functions from incomplete data is proposed, which integrates parametric regression with nonparametric matching estimators. Besides its applicability to missing data situations due to non-response or attrition, the estimator can also be used for analyzing treatment effect heterogeneity and statistical treatment rules, where data on potential outcomes is missing by definition. By combining moments from a parametric specification with nonparametric estimates of mean outcomes in the non-responding population within a GMM framework, the estimator seeks to balance a good fit in the responding population with low bias in the non-responding population. The estimator is applied to analyzing treatment effect heterogeneity among Swedish rehabilitation programmes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. The estimation of average treatment effects has been intensively analyzed, in particular for active labour market programmes and rehabilitation programmes. See for example, Aakvik (2003), Abbring and van den Berg (2004) and the Special Issue on ‘Long term unemployment and social assistance’, Empirical Economics (1/2), 1998). The focus of this paper is on the heterogeneity in treatment effects, which could be exploited to improve the average effectiveness of policies through a better participant allocation.

  2. For a more detailed discussion see Wald (1950); Heckman et al. (1997b); Black et al. (2003); Manski (2000; 2004) and Dehejia (2004).

  3. See Rubin (1974); Heckman and Robb (1985); Barnow et al. (1981); Lechner (1999).

  4. E.g. in the case of panel attrition, X may refer to information collected in the baseline period.

  5. See e.g. Angrist (1998), Heckman et al. (1998b); Dehejia and Wahba (1999); Lechner (1999); Gerfin and Lechner (2002) and Jalan and Ravallion (2003), among many others.

  6. The support restriction is incorporated by considering only observations with \(\widehat{p}_{i} > 0\), because S x ={x:f XD=1(x)>0}={x:p(x)>0}

  7. This permits a simpler derivation of the asymptotic properties. For the practical implementation, both Eqs. 5 or 6 can be used.

  8. More precisely, let \(\widehat{m}_{{vl}} {\left( \rho \right)}\) for ρ>0 be an estimator of the expectation E[Y v p(X)=ρ, Λ l (X)=1], i.e. the expectation of the v-th variable of the outcome vector Y conditional on the propensity score in the l-th subpopulation. Let \(\widehat{m}_{l} {\left( \cdot \right)} = {\left( {\widehat{m}_{{1l}} {\left( \cdot \right)}, \ldots ,\widehat{m}_{{vl}} {\left( \cdot \right)}, \ldots \widehat{m}_{{Vl}} {\left( \cdot \right)}} \right)}^{\prime } \) be the element-wise-defined estimator of the outcome vector Y in the population l, i.e. of E[Yp(X)=ρ, Λ l (X)=1]. Stacking these estimators for the L subpopulations and multiplying element-wise with the population indicator function gives \( \widehat{m}_{{VL}} {\left( {\widehat{p}{\left( {X_{i} } \right)}} \right)} = {\left( {\widehat{m}^{\prime }_{1} {\left( {p{\left( {X_{i} } \right)}} \right)} \cdot \Lambda _{1} {\left( {X_{i} } \right)}, \ldots \widehat{m}^{\prime }_{l} {\left( {\widehat{p}{\left( {X_{i} } \right)}} \right)} \cdot \Lambda _{l} {\left( {X_{i} } \right)}, \ldots ,\widehat{m}^{\prime }_{L} {\left( {\widehat{p}{\left( {X_{i} } \right)}} \right)} \cdot \Lambda _{L} {\left( {X_{i} } \right)}} \right)}^{\prime } \).

  9. When a standard propensity score matching routine is used, care should be exercised to ensure that the lower VL moments in (10) are summed over the same observations as in \(\widehat{\mu }\) and are scaled in the same way. For example, if the propensity score matching routine estimates the mean counterfactual outcome \({{\sum {\widehat{m}_{VL} \left( {\widehat p_i } \right)\left( {1 - D_i } \right)1\left( {\widehat p_i > 0} \right)} } \over {\sum {\left( {1 - D_i } \right)1} \left( {\widehat p_i > 0} \right)}}\) instead of \({{\sum {\widehat m_{VL} \left( {\widehat p_i } \right)\left( {1 - D_i } \right)1\left( {\widehat p_i > 0} \right)} } \over n}\), then also the VL must be scaled accordingly.

  10. This includes one-to-one or pair matching.

  11. Using Epanechnikov instead of Gaussian kernel, and vice versa, led to largely similar results.

  12. The X data are scaled in the estimator to mean zero and variance one.

  13. The expected outcomes vary considerably among these subpopulations. Whereas with DGP 1, the expected outcome is 13.1 for the respondents and 5.3 for the non-respondents, the outcome difference between respondents and non-respondents can be as large as 8.2 (for subpopulations ten and eleven) and as small as 0.8 (for subpopulation fourteen). Similar heterogeneity occurs for DGP 2 and 3. For instance, in DGP 2 the expected outcome for the respondents is usually larger than for the non-respondents, but this relationship is reversed in subpopulation five. In DGP 2, the expected outcomes for respondents and non-respondents are 2.2 and 1.5, respectively, and in DGP 3 these figures are 9.6 and 4.3.

  14. See Angrist and Krueger (1999) and Heckman et al. (1999) for an overview.

  15. Unless the past participants have been assigned randomly to the programmes.

  16. Regularly employed individuals receive for the first two weeks sickness benefits from the employer and afterwards from the insurance office. Unemployed and self-employed individuals receive benefits directly from the insurance office. Sickness benefits amount to 80% of previous earnings, adjusted for the degree of lost working capacity and cut at an upper ceiling, and can be received for an unlimited period.

  17. Medical and social rehabilitation are not coordinated by the insurance office.

  18. The insurance offices themselves do not conduct rehabilitative activities.

  19. A number of cases received more than one type of rehabilitation. Since neither it is known whether these measures where given in parallel or sequentially, nor the time sequence of these measures, these cases were assigned to the supposedly first or principal of the rehabilitative measures received. In most cases this has been medical rehabilitation, which is likely to be the first measure. The second priority is given to workplace rehabilitation, since workplace rehabilitation is usually full-time while educational training may operate alongside. For further details on the data see Frölich et al. (2004).

  20. The reason for the latter is that the assessment refers to vocational rehabilitation.

References

  • Aakvik A (2003) Estimating the employment effects of education for disabled workers in Norway. Empir Econ 28:515–533

    Article  Google Scholar 

  • Abbring J, van den Berg G (2004) Analyzing the effect of dynamically assigned treatments using duration models, binary treatment models, and panel data models. Empirical Econ 29:5–20

    Article  Google Scholar 

  • Angrist J (1998) Estimating labour market impact of voluntary military service using social security data. Econometrica 66:249–288

    Article  Google Scholar 

  • Angrist J, Krueger A (1999) Empirical strategies in labor economics. In: Ashenfelter O, Card D (eds) The handbook of labor economics, III. North-Holland, New York, pp 1277–1366

    Google Scholar 

  • Barnow B, Cain G, Goldberger A (1981) Selection on observables. Evaluation Studies Review Annual 5:43–59

    Google Scholar 

  • Black D, Smith J, Berger M, Noel B (2003) Is the threat of reemployment services more effective than the services themselves?—evidence from random assignment in the UI system. Am Econ Rev 93:1313–1327

    Article  Google Scholar 

  • Dehejia R (2004) Program evaluation as a decision problem. forthcoming in J Econ

  • Dehejia R, Wahba S (1999) Causal effects in non-experimental studies: reevaluating the evaluation of training programmes. J Am Stat Assoc 94:1053–1062

    Article  Google Scholar 

  • Fan J (1992) Design-adaptive nonparametric regression. J Am Stat Assoc 87:998–1004

    Article  Google Scholar 

  • Frölich M (2004) Finite sample properties of propensity-score matching and weighting estimators. Rev Econ Stat 86:77–90

    Article  Google Scholar 

  • Frölich M (2005) Matching estimators and optimal bandwidth choice. Stat Comput 15(3):197–215

    Article  Google Scholar 

  • Frölich M, Heshmati, A, Lechner, M (2004) A microeconometric evaluation of rehabilitation of long-term sickness in Sweden. J Appl Econ 19:375–396

    Article  Google Scholar 

  • Gerfin M, Lechner M (2002) Microeconometric evaluation of the active labour market policy in Switzerland. Econ J 112:854–893

    Article  Google Scholar 

  • Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66:315–331

    Article  Google Scholar 

  • Hansen LP (1982) Large sample properties of generalized method of moment estimators. Econometrica 50:1029–1054

    Article  Google Scholar 

  • Heckman J, Robb R (1985) Alternative methods for evaluating the impact of interventions. In: Heckman J, Singer B (eds) Longitudinal analysis of labour market data. Cambridge University Press, Cambridge

    Google Scholar 

  • Heckman J, Ichimura H, Todd P (1997) Matching as an econometric evaluation estimator: evidence from evaluating a job training programme. Rev Econ Stud 64:605–654

    Article  Google Scholar 

  • Heckman J, Smith J, Clements N (1997) Making the most out of programme evaluations and social experiments: accounting for heterogeneity in programme impacts. Rev Econ Stud 64:487–535

    Article  Google Scholar 

  • Heckman J, Ichimura H, Todd P (1998) Matching as an econometric evaluation estimator. Rev Econ Stud 65:261–294

    Article  Google Scholar 

  • Heckman J, Ichimura H, Smith J, Todd P (1998) Characterizing selection bias using experimental data. Econometrica 66:1017–1098

    Article  Google Scholar 

  • Heckman J, LaLonde R, Smith J (1999) The economics and econometrics of active labour market programs. In: Ashenfelter O, Card D (eds) The handbook of labor economics, III. North-Holland, New York, pp 1865–2097

    Google Scholar 

  • Jalan J, Ravallion M (2003) Estimating the benefit incidence of an antipoverty program by propensity-score matching. J Bus Econ Stat 21:19–30

    Article  Google Scholar 

  • Lechner M (1999) Earnings and employment effects of continuous off-the-job training in east Germany after unification. J Bus Econ Stat 17:74–90

    Article  Google Scholar 

  • Little R, Rubin D (1987) Statistical analysis with missing data. Wiley, New York

    Google Scholar 

  • Manski C (2000) Identification problems and decisions under ambiguity: empirical analysis of treatment response and normative analysis of treatment choice. J Econ 95:415–442

    Google Scholar 

  • Manski C (2004) Statistical treatment rules for heterogeneous populations. Econometrica 72:1221–1246

    Article  Google Scholar 

  • Rosenbaum P, Rubin D (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55

    Article  Google Scholar 

  • Rubin D (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66:688–701

    Article  Google Scholar 

  • Seifert B, Gasser T (1996) Finite-sample variance of local polynomials: analysis and solutions. J Am Stat Assoc 91:267–275

    Article  Google Scholar 

  • Seifert B, Gasser T (2000) Data adaptive ridging in local polynomial regression. J Comput Graph Stat 9:338–360

    Article  Google Scholar 

  • Wald A (1950) Statistical decision functions. Wiley, New York

    Google Scholar 

Download references

Acknowledgment

The author is also affiliated with the Institute for the Study of Labor (IZA), Bonn. I am grateful for discussions and comments to Bo Honoré, Francois Laisney, Michael Lechner, Ruth Miquel, Oivind Nilsen, Jeff Smith, the editor and three anonymous referees. This research was supported by the Swiss National Science Foundation (project NSF 4043-058311) and the Grundlagenforschungsfonds HSG (project G02110112).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Frölich.

Appendices

Appendix A: Monte Carlo results

Table A.1 Mean squared error (sample size 2000, ridge matching estimator)
Table A.2 Mean squared error (sample size 500, kernel matching estimator)
Table A.3 Mean squared error (sample size 2,000, kernel matching estimator)

Appendix B: Swedish rehabilitation programmes

This Appendix contains additional tables on the estimation of optimal programme choices. Further results on alternative specifications are available in the supplementary Appendix.

Table B.1 gives the observed treatment outcomes and the nonparametrically estimated counterfactual outcomes for the 11 populations used in the GMM estimator in Section 4. The entry 48.3 in the top left, for example, indicates that among all the participants in No rehabilitation, a employment rate of 48.3% was observed. The potential No rehabilitation outcome for those who did not participate in No rehabilitation is estimated to be 43.6%. The respective figures for the 46–55 years old are 49.8 and 41.5%. The mean counterfactual outcomes are estimated separately for each population by ridge matching with the bandwidth value chosen by least-squares cross-validation from the grid {0.02,0.04,..,1}.

Table B.1 Observed outcomes and estimated counterfactual outcomes in the 11 subpopulations
Table B.2 Average characteristics by treatment group: optimal vs. actual allocation

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frölich, M. Semiparametric estimation of conditional mean functions with missing data. Empirical Economics 31, 333–367 (2006). https://doi.org/10.1007/s00181-005-0019-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00181-005-0019-4

Keywords

Navigation