Abstract
We present a fundamentally unique method of nonparametric regression using clusters and test it against classically established methods. We compare two nonlinear regression estimation packages called ‘NNS’, Viole (NNS: nonlinear nonparametric statistics, 2016), and ‘np’, Hayfield and Racine (J Stat Softw 27(5):1–32, 2008), with the help of a simulation using deterministic (DT) and stochastic (ST) regressor models. We find the respective coefficients of determination \((R^2)\) are close for DT models, while finding an advantage to NNS in ST and large sample cases. Regression coefficients are sometimes regarded as approximations to partial derivatives, especially in social sciences. Then, NNS alone has the ability to compute a range of partials evaluated at points within the sample and also out-of-sample. Thus NNS can provide a viable alternative to kernel based nonparametric regressions without using bandwidths for smoothing.
This is a preview of subscription content, access via your institution.




Notes
In R, we used the variables: >x = seq(0,4*pi,pi/1000);y = sin(x)
There is considerable literature devoted to the topic of boundaries and endpoints in ‘np’, with no consensus as to how to best handle them.
The mean of course is defined as the sum of weighted observations, weighted by 1 over the number of all observations. The single observation weighting and quadrant partitions eventually lead to the limit condition of NNS partitioning.
The variance is not directly used in NNS regression. Viole and Nawrocki (2012a) demonstrate the covariance matrix equivalence from NNS partitioning and it is mentioned here in order to further exemplify NNS partitioning’s robustness.
The use of explicit linear regressions within segments is avoided by NNS since it cannot reach such limits. This is because one needs a minimum number of observations for a sensible regression fit. Empty segments are ignored by NNS.
The R code for simulation is available on GitHub: https://github.com/OVVO-Financial/NNS/tree/Data-and-Simulation-Routines/NNS-Simulation-Routines The NNS R routines are also available on GitHub: https://github.com/OVVO-Financial/NNS/tree/NNS-Beta-Version/R
References
Bawa, V. S. (1975). Optimal rules for ordering uncertain prospects. Journal of Financial Economics, 2(95), 121.
Bellman, R. (1961). On the approximation of curves by line segments using dynamic programming. Communications of the ACM, 4(6), 284. http://dl.acm.org/citation.cfm?doid=366573.366611.
Bock, H.-H. (2008). Origins and extensions of the -means algorithm in cluster analysis. Journal lectronique d’Histoire des Probabilits et de la Statistique [electronic only], 4(2), 14–181418. http://eudml.org/doc/130880.
Doksum, K., & Samarov. A. (1995). Nonparametric estimation of global functionals and a measure of the explanatory power of covariates in regression. Annals of Statistics, 23(5), 1443–1473. https://projecteuclid.org/euclid.aos/1176324307.
Hayfield, T., & Racine, J. S. (2008). Nonparametric econometrics: The np package. Journal of Statistical Software, 27(5), 1–32. http://www.jstatsoft.org/v27/i05/.
Li, Q., & Racine, J. S. (2007). Nonparametric econometrics. Princeton: Princeton University Press.
Stone, H. (1961). Approximation of curves by line segments. Mathematics of Computation, 15(73), 40–47. http://www.jstor.org/stable/2283327.
Vinod, H. D. (2008). Hands-on intermediate econometrics using R: Templates for ex- tending dozens of practical examples. Hackensack, NJ: World Scientific. ISBN 10-981-281-885-5. http://www.worldscibooks.com/economics/6895.html.
Vinod, H. D., & Reagle, D. (2005). Preparing for the worst: Incorporating downside risk in stock market investments (monograph). New York: Wiley.
Vinod, H. D., & Ullah, A. (1993). General nonparametric regression estimation and testing in econometrics. In G. S. Maddala, C. R. Rao, & H. D. Vinod (Eds.), Handbook of statistics: Econometrics (Vol. 11, pp. 85–116). New York: North Holland, Elsevier.
Viole, F. (2016a). Beyond correlation: Using the elements of variance for conditional means and probabilities. SSRN eLibrary. http://ssrn.com/abstract=2745308.
Viole, F. (2016b). NNS: Nonlinear nonparametric statistics. R package version 0.3.3. https://CRAN.R-project.org/package=NNS.
Viole, F, & Nawrocki, D. (2012a). Cumulative distribution functions and UPM/LPM analysis. SSRN eLibrary. http://ssrn.com/abstract=2148482.
Viole, F., & Nawrocki, D. (2012b). Deriving nonlinear correlation coefficients from partial moments. SSRN eLibrary. http://ssrn.com/abstract=2148522.
Acknowledgements
We would like to thank the anonymous referees providing valuable commentary on earlier versions of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vinod, H.D., Viole, F. Nonparametric Regression Using Clusters. Comput Econ 52, 1317–1334 (2018). https://doi.org/10.1007/s10614-017-9713-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-017-9713-5