Skip to main content

Advertisement

Log in

An Introduction to Nonparametric Regression for Labor Economists

  • Published:
Journal of Labor Research Aims and scope Submit manuscript

Abstract

In this article we overview nonparametric (spline and kernel) regression methods and illustrate how they may be used in labor economics applications. We focus our attention on issues commonly found in the labor literature such as how to account for endogeneity via instrumental variables in a nonparametric setting. We showcase these methods via data from the Current Population Survey.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

Upon publication, all data and R code used in this paper will be available on the publisher’s website.

Notes

  1. Fixing the sample to college educated males allows us to plot these figures in two dimensions.

  2. See https://stat.ethz.ch/R-manual/R-devel/library/splines/html/bs.html and the seemingly equivalent bSpline (⋅) function in the splines2 package.

  3. The matrix of the coefficients being

  4. Note that the last term − λ2C disappears as it does not influence the solution

  5. We raise λ to the power of 2p due to the way we add bases. Intuitively, raising λ to the power of 2p can be explained by the following example: if we transform X into αX for any α > 0, we want to have the equivalent transformation done on the smoothing parameter λαλ to get the same fit.

  6. Note that to compute our CV statistics, we transformed Eq. 18 to avoid the high computational cost of calculating n versions of \(\widehat {m}_{-i}(x_{i};\lambda )\) (i.e., the order-n2 algorithm) using fast order-n (Hutchinson and De Hoog 1985).

  7. Montoya et al. (2014) use a simulation to test the performance of different knot selection methods with equidistant knots in a p-spline model. Specifically, they compare the methods presented in Ruppert et al. (2003) with the myopic algorithm knot selection method, and the full search algorithm knot selection method. Their results show that the default choice method performs just as well or better than the other methods when using different commonly used smoothing parameter selection methods.

  8. While there is no theoretical justification for doing so, it is common to use rule-of-thumb methods designed for density estimation as a form of exploratory analysis. In fact, we used a rule-of-thumb to compute the bandwidth in our previous examples (“Nonparametric Regression”). In its general form, the bandwidth (designed for Gaussian densities with a Gaussian kernel) is \(h_{rot} = 1.06{\sigma _{x}^{2}}n^{-1/5}\). For the remainder of the article, we will use bandwidths selected via cross-validation.

  9. Those IVs include, but are not limited to: minimum school-leaving age, quarter of birth, school costs, proximity to schools, loan policies, school reforms, spouse’s and parents’ education/income.

  10. Recall that in our previous examples, the level of education is fixed at 16 years – college degree.

  11. Multiple endogenous regressors can be handled by running separate first stage regressions and putting the residuals from each of those regressions into the second stage regression and finally summing over i to obtain the conditional mean estimates.

  12. The acceptable range for γ is between \(\left (2\left (p_{2} + 1\right )+q_{1}+ 1\right )^{-1}\max \left [\frac {p_{2}+ 1}{p_{1}+ 1},\frac {p_{2}+ 3}{2\left (p_{1}+ 1\right )}\right ]\) and \(\left (2\left (p_{2} + 1\right )+q_{1}+ 1\right )^{-1}\frac {p_{2}+q_{1}}{q_{1}+q_{2}}\), where q1 and q2 represent the number of elements in the first and second stage regressions, respectively.

  13. We could combine spline and kernel methods to obtain an IV estimator as in Ozabaci et al. (2014). Using this combination allows for a lower computational burden and oracally efficient estimates.

References

  • Cameron AC, Trivedi PK (2010) Microeconometrics using Stata, vol 2. Stata Press, College Station

    Google Scholar 

  • Eilers PHC, Marx BD (1996) Flexible smoothing with b-splines and penalties. Stat Sci 11(2):89–102

    Article  Google Scholar 

  • Eilers PHC, Marx BD (2010) Splines knots, and penalties. Wiley Interdiscip Rev Comput Stat 2(6):637–653

    Article  Google Scholar 

  • Hall P, Racine JS (2015) Infinite-order cross-validated local polynomial regression. J Econom 185:510–525

    Article  Google Scholar 

  • Hayfield T, Racine JS (2008) Nonparametric econometrics: the np package. J Stat Softw 27:1–32

    Article  Google Scholar 

  • Henderson DJ, Parmeter CF (2015) Applied nonparametric econometrics. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Henderson DJ, Polachek SW, Wang L (2011) Heterogeneity in schooling rates of return. Econ Educ Rev 30:1202–1214

    Article  Google Scholar 

  • Hutchinson MF, De Hoog FR (1985) Smoothing noisy data with spline functions. Numer Math 47(1):99–106

    Article  Google Scholar 

  • Ma S, Racine JS, Yang L (2015) Spline regression in the presence of categorical predictors. J Appl Econom 30:705–717

    Article  Google Scholar 

  • Montoya EL, Ulloa N, Miller V (2014) A simulation study comparing knot selection methods with equally spaced knots in a penalized regression spline. Int J Stat Probab 3(3):96

    Article  Google Scholar 

  • Nadaraya EA (1964) On estimating regression. Theory Probab Its Appl 9 (1):141–142

    Article  Google Scholar 

  • Newey WK, Powell JL, Vella F (1999) Nonparametric estimation of triangular simultaneous equations models. Econometrica 67(3):565–603

    Article  Google Scholar 

  • Ozabaci D, Henderson DJ, Su L (2014) Additive nonparametric regression in the presence of endogenous regressors. J Bus Econ Stat 32(4):555–575

    Article  Google Scholar 

  • Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Su L, Ullah A (2008) Local polynomial estimation of nonparametric simultaneous equations models. J Econom 144:193–218

    Article  Google Scholar 

  • Watson GS (1964) Smooth regression analysis. Sankhyā: The Indian Journal of Statistics Series A 26(4):359–372

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel J. Henderson.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Henderson, D.J., Souto, AC. An Introduction to Nonparametric Regression for Labor Economists. J Labor Res 39, 355–382 (2018). https://doi.org/10.1007/s12122-018-9279-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12122-018-9279-6

Keywords

JEL Classification

Navigation