Abstract
In this article we overview nonparametric (spline and kernel) regression methods and illustrate how they may be used in labor economics applications. We focus our attention on issues commonly found in the labor literature such as how to account for endogeneity via instrumental variables in a nonparametric setting. We showcase these methods via data from the Current Population Survey.
Similar content being viewed by others
Data Availability
Upon publication, all data and R code used in this paper will be available on the publisher’s website.
Notes
Fixing the sample to college educated males allows us to plot these figures in two dimensions.
See https://stat.ethz.ch/R-manual/R-devel/library/splines/html/bs.html and the seemingly equivalent bSpline (⋅) function in the splines2 package.
The matrix of the coefficients being
Note that the last term − λ2C disappears as it does not influence the solution
We raise λ to the power of 2p due to the way we add bases. Intuitively, raising λ to the power of 2p can be explained by the following example: if we transform X into αX for any α > 0, we want to have the equivalent transformation done on the smoothing parameter λ → αλ to get the same fit.
Montoya et al. (2014) use a simulation to test the performance of different knot selection methods with equidistant knots in a p-spline model. Specifically, they compare the methods presented in Ruppert et al. (2003) with the myopic algorithm knot selection method, and the full search algorithm knot selection method. Their results show that the default choice method performs just as well or better than the other methods when using different commonly used smoothing parameter selection methods.
While there is no theoretical justification for doing so, it is common to use rule-of-thumb methods designed for density estimation as a form of exploratory analysis. In fact, we used a rule-of-thumb to compute the bandwidth in our previous examples (“Nonparametric Regression”). In its general form, the bandwidth (designed for Gaussian densities with a Gaussian kernel) is \(h_{rot} = 1.06{\sigma _{x}^{2}}n^{-1/5}\). For the remainder of the article, we will use bandwidths selected via cross-validation.
Those IVs include, but are not limited to: minimum school-leaving age, quarter of birth, school costs, proximity to schools, loan policies, school reforms, spouse’s and parents’ education/income.
Recall that in our previous examples, the level of education is fixed at 16 years – college degree.
Multiple endogenous regressors can be handled by running separate first stage regressions and putting the residuals from each of those regressions into the second stage regression and finally summing over i to obtain the conditional mean estimates.
The acceptable range for γ is between \(\left (2\left (p_{2} + 1\right )+q_{1}+ 1\right )^{-1}\max \left [\frac {p_{2}+ 1}{p_{1}+ 1},\frac {p_{2}+ 3}{2\left (p_{1}+ 1\right )}\right ]\) and \(\left (2\left (p_{2} + 1\right )+q_{1}+ 1\right )^{-1}\frac {p_{2}+q_{1}}{q_{1}+q_{2}}\), where q1 and q2 represent the number of elements in the first and second stage regressions, respectively.
We could combine spline and kernel methods to obtain an IV estimator as in Ozabaci et al. (2014). Using this combination allows for a lower computational burden and oracally efficient estimates.
References
Cameron AC, Trivedi PK (2010) Microeconometrics using Stata, vol 2. Stata Press, College Station
Eilers PHC, Marx BD (1996) Flexible smoothing with b-splines and penalties. Stat Sci 11(2):89–102
Eilers PHC, Marx BD (2010) Splines knots, and penalties. Wiley Interdiscip Rev Comput Stat 2(6):637–653
Hall P, Racine JS (2015) Infinite-order cross-validated local polynomial regression. J Econom 185:510–525
Hayfield T, Racine JS (2008) Nonparametric econometrics: the np package. J Stat Softw 27:1–32
Henderson DJ, Parmeter CF (2015) Applied nonparametric econometrics. Cambridge University Press, Cambridge
Henderson DJ, Polachek SW, Wang L (2011) Heterogeneity in schooling rates of return. Econ Educ Rev 30:1202–1214
Hutchinson MF, De Hoog FR (1985) Smoothing noisy data with spline functions. Numer Math 47(1):99–106
Ma S, Racine JS, Yang L (2015) Spline regression in the presence of categorical predictors. J Appl Econom 30:705–717
Montoya EL, Ulloa N, Miller V (2014) A simulation study comparing knot selection methods with equally spaced knots in a penalized regression spline. Int J Stat Probab 3(3):96
Nadaraya EA (1964) On estimating regression. Theory Probab Its Appl 9 (1):141–142
Newey WK, Powell JL, Vella F (1999) Nonparametric estimation of triangular simultaneous equations models. Econometrica 67(3):565–603
Ozabaci D, Henderson DJ, Su L (2014) Additive nonparametric regression in the presence of endogenous regressors. J Bus Econ Stat 32(4):555–575
Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, Cambridge
Su L, Ullah A (2008) Local polynomial estimation of nonparametric simultaneous equations models. J Econom 144:193–218
Watson GS (1964) Smooth regression analysis. Sankhyā: The Indian Journal of Statistics Series A 26(4):359–372
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Henderson, D.J., Souto, AC. An Introduction to Nonparametric Regression for Labor Economists. J Labor Res 39, 355–382 (2018). https://doi.org/10.1007/s12122-018-9279-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12122-018-9279-6