Nonparametric Regression Using Clusters

Abstract

We present a fundamentally unique method of nonparametric regression using clusters and test it against classically established methods. We compare two nonlinear regression estimation packages called ‘NNS’, Viole (NNS: nonlinear nonparametric statistics, 2016), and ‘np’, Hayfield and Racine (J Stat Softw 27(5):1–32, 2008), with the help of a simulation using deterministic (DT) and stochastic (ST) regressor models. We find the respective coefficients of determination \((R^2)\) are close for DT models, while finding an advantage to NNS in ST and large sample cases. Regression coefficients are sometimes regarded as approximations to partial derivatives, especially in social sciences. Then, NNS alone has the ability to compute a range of partials evaluated at points within the sample and also out-of-sample. Thus NNS can provide a viable alternative to kernel based nonparametric regressions without using bandwidths for smoothing.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. 1.

    In R, we used the variables: >x = seq(0,4*pi,pi/1000);y = sin(x)

  2. 2.

    There is considerable literature devoted to the topic of boundaries and endpoints in ‘np’, with no consensus as to how to best handle them.

  3. 3.

    The mean of course is defined as the sum of weighted observations, weighted by 1 over the number of all observations. The single observation weighting and quadrant partitions eventually lead to the limit condition of NNS partitioning.

  4. 4.

    The variance is not directly used in NNS regression. Viole and Nawrocki (2012a) demonstrate the covariance matrix equivalence from NNS partitioning and it is mentioned here in order to further exemplify NNS partitioning’s robustness.

  5. 5.

    The use of explicit linear regressions within segments is avoided by NNS since it cannot reach such limits. This is because one needs a minimum number of observations for a sensible regression fit. Empty segments are ignored by NNS.

  6. 6.

    The R code for simulation is available on GitHub: https://github.com/OVVO-Financial/NNS/tree/Data-and-Simulation-Routines/NNS-Simulation-Routines The NNS R routines are also available on GitHub: https://github.com/OVVO-Financial/NNS/tree/NNS-Beta-Version/R

References

  1. Bawa, V. S. (1975). Optimal rules for ordering uncertain prospects. Journal of Financial Economics, 2(95), 121.

    Google Scholar 

  2. Bellman, R. (1961). On the approximation of curves by line segments using dynamic programming. Communications of the ACM, 4(6), 284. http://dl.acm.org/citation.cfm?doid=366573.366611.

    Article  Google Scholar 

  3. Bock, H.-H. (2008). Origins and extensions of the -means algorithm in cluster analysis. Journal lectronique d’Histoire des Probabilits et de la Statistique [electronic only], 4(2), 14–181418. http://eudml.org/doc/130880.

  4. Doksum, K., & Samarov. A. (1995). Nonparametric estimation of global functionals and a measure of the explanatory power of covariates in regression. Annals of Statistics, 23(5), 1443–1473. https://projecteuclid.org/euclid.aos/1176324307.

    Article  Google Scholar 

  5. Hayfield, T., & Racine, J. S. (2008). Nonparametric econometrics: The np package. Journal of Statistical Software, 27(5), 1–32. http://www.jstatsoft.org/v27/i05/.

  6. Li, Q., & Racine, J. S. (2007). Nonparametric econometrics. Princeton: Princeton University Press.

    Google Scholar 

  7. Stone, H. (1961). Approximation of curves by line segments. Mathematics of Computation, 15(73), 40–47. http://www.jstor.org/stable/2283327.

    Article  Google Scholar 

  8. Vinod, H. D. (2008). Hands-on intermediate econometrics using R: Templates for ex- tending dozens of practical examples. Hackensack, NJ: World Scientific. ISBN 10-981-281-885-5. http://www.worldscibooks.com/economics/6895.html.

  9. Vinod, H. D., & Reagle, D. (2005). Preparing for the worst: Incorporating downside risk in stock market investments (monograph). New York: Wiley.

    Google Scholar 

  10. Vinod, H. D., & Ullah, A. (1993). General nonparametric regression estimation and testing in econometrics. In G. S. Maddala, C. R. Rao, & H. D. Vinod (Eds.), Handbook of statistics: Econometrics (Vol. 11, pp. 85–116). New York: North Holland, Elsevier.

    Google Scholar 

  11. Viole, F. (2016a). Beyond correlation: Using the elements of variance for conditional means and probabilities. SSRN eLibrary. http://ssrn.com/abstract=2745308.

  12. Viole, F. (2016b). NNS: Nonlinear nonparametric statistics. R package version 0.3.3. https://CRAN.R-project.org/package=NNS.

  13. Viole, F, & Nawrocki, D. (2012a). Cumulative distribution functions and UPM/LPM analysis. SSRN eLibrary. http://ssrn.com/abstract=2148482.

  14. Viole, F., & Nawrocki, D. (2012b). Deriving nonlinear correlation coefficients from partial moments. SSRN eLibrary. http://ssrn.com/abstract=2148522.

Download references

Acknowledgements

We would like to thank the anonymous referees providing valuable commentary on earlier versions of this manuscript.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Hrishikesh D. Vinod.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Vinod, H.D., Viole, F. Nonparametric Regression Using Clusters. Comput Econ 52, 1317–1334 (2018). https://doi.org/10.1007/s10614-017-9713-5

Download citation

Keywords

  • Curve fitting
  • Derivative estimation
  • Partitioning without smoothing
  • Sufficiency
  • Perfect fit
  • Simulation