Skip to main content
Log in

Nonparametric multiple regression by projection on non-compactly supported bases

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We study the nonparametric regression estimation problem with a random design in \({\mathbb{R}}^{p}\) with \(p\ge 2\). We do so by using a projection estimator obtained by least squares minimization. Our contribution is to consider non-compact estimation domains in \({\mathbb {R}}^{p}\), on which we recover the function, and to provide a theoretical study of the risk of the estimator relative to a norm weighted by the distribution of the design. We propose a model selection procedure in which the model collection is random and takes into account the discrepancy between the empirical norm and the norm associated with the distribution of design. We prove that the resulting estimator automatically optimizes the bias-variance trade-off in both norms, and we illustrate the numerical performance of our procedure on simulated data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. in general, it is a semi-norm but we will only consider subspaces on which it is a norm.

References

  • Arlot, S., Massart, P. (2009). Data-driven calibration of penalties for least-squares regression. Journal of Machine Learning Research, 10(10), 245–279.

    Google Scholar 

  • Baraud, Y. (2000). Model selection for regression on a fixed design. Probability Theory and Related Fields, 117(4), 467–493.

    Article  MathSciNet  MATH  Google Scholar 

  • Baraud, Y. (2002). Model selection for regression on a random design. ESAIM: Probability and Statistics, 6, 127–146.

    Article  MathSciNet  MATH  Google Scholar 

  • Barron, A., Birgé, L., Massart, P. (1999). Risk bounds for model selection via penalization. Probability Theory and Related Fields, 113(3), 301–413.

    Article  MathSciNet  MATH  Google Scholar 

  • Birgé, L., Massart, P. (1998). Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli, 4(3), 329–375.

    Article  MathSciNet  MATH  Google Scholar 

  • Cohen, A., Davenport, M. A., Leviatan, D. (2013). On the stability and accuracy of least squares approximations. Foundations of Computational Mathematics, 13(5), 819–834.

    Article  MathSciNet  MATH  Google Scholar 

  • Comte, F., Genon-Catalot, V. (2018). Laguerre and Hermite bases for inverse problems. Journal of the Korean Statistical Society, 47(3), 273–296.

    Article  MathSciNet  MATH  Google Scholar 

  • Comte, F., Genon-Catalot, V. (2020a). Regression function estimation as a partly inverse problem. Annals of the Institute of Statistical Mathematics, 72(4), 1023–1054.

    Article  MathSciNet  MATH  Google Scholar 

  • Comte, F., Genon-Catalot, V. (2020b). Regression function estimation on non compact support in an Heteroscesdastic model. Metrika, 83(1), 93–128.

    Article  MathSciNet  MATH  Google Scholar 

  • Comte, F., Marie, N. (2021). On a Nadaraya-Watson estimator with two bandwidths. Electronic Journal of Statistics, 15(1), 2566–2607.

    Article  MathSciNet  MATH  Google Scholar 

  • Efromovich, S. (1999). Nonparametric curve estimation: Methods, theory and applications. Springer series in statistics, New York: Springer.

    MATH  Google Scholar 

  • Gittens, A., Tropp, J.A. (2011) Tail bounds for all eigenvalues of a sum of random matrices. ArXiv:1104.4513 [math].

  • Györfi, L., Kohler, M., Krzyżak, A., Walk, H. (2002). A distribution-free theory of nonparametric regression. Springer series in statistics, New York, NY: Springer New York.

    Book  MATH  Google Scholar 

  • Härdle, W., Marron, J. S. (1985). Optimal bandwidth selection in nonparametric regression function estimation. The Annals of Statistics, 13(4), 1465–1481.

    Article  MathSciNet  MATH  Google Scholar 

  • Köhler, M., Schindler, A., Sperlich, S. (2014). A review and comparison of bandwidth selection methods for Kernel regression: Review of bandwidth selection for regression. International Statistical Review, 82(2), 243–274.

    Article  MathSciNet  MATH  Google Scholar 

  • Lacour, C., Massart, P., Rivoirard, V. (2017). Estimator selection: A new method with applications to Kernel density estimation. Sankhya A, 79(2), 298–335.

    Article  MathSciNet  MATH  Google Scholar 

  • Mabon, G. (2017). Adaptive deconvolution on the non-negative real line: Adaptive deconvolution on R+. Scandinavian Journal of Statistics, 44(3), 707–740.

    Article  MathSciNet  MATH  Google Scholar 

  • Nadaraya, E. A. (1964). On estimating regression. Theory of Probability & Its applications, 9(1), 141–142.

    Article  MATH  Google Scholar 

  • Sacko, O. (2020). Hermite density deconvolution. Latin American Journal of Probability and Mathematical Statistics, 17(1), 419–443.

    Article  MathSciNet  MATH  Google Scholar 

  • Tao, T. (2008). The divisor bound. https://terrytao.wordpress.com/2008/09/23/the-divisor-bound.

  • Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4), 389–434.

    Article  MathSciNet  MATH  Google Scholar 

  • Tsybakov, A. B. (2009). Introduction to nonparametric estimation. Series in statistics. London: Springer.

    Book  MATH  Google Scholar 

  • Watson, GS. (1964). Smooth regression analysis. Sankhyā: The Indian Journal of Statistics, Series A 26(4):359–372.

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

I want to thank Fabienne Comte and Céline Duval for their helpful advice and their support of my work. I also want to thank Florence Merlevède for her help with the second inequality of the Matrix Chernoff bound. Finally, I want to thank Herb Susmann for proofreading this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florian Dussap.

Ethics declarations

Conflict of interest

The author declares that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by a grant from Région Île-de-France.

Appendices

A Linear algebra

Lemma 8

Let E be a Euclidean vector space and let \(\ell :E\rightarrow {\mathbb {R}}^n\) be an injective linear map. For \(y\in {\mathbb {R}}^n\), the solution of the problem:

$$\begin{aligned} {\hat{a}} {:}{=} {{\,\mathrm{arg\, min}\,}}_{a\in E} \,\left| \left| y - \ell (a)\right| \right| _{{\mathbb {R}}^n}^2 \end{aligned}$$

is given by:

$$\begin{aligned} {\hat{a}} = \left[ (\ell ^* \circ \ell )^{-1} \circ \ell ^*\right] (y), \end{aligned}$$

where \(\ell ^*:{\mathbb {R}}^n\rightarrow E\) is characterized by the relation \({\langle }{ y, \ell (a) }{\rangle }_{{\mathbb {R}}^n} = {\langle }{ \ell ^*(y), a }{\rangle }_{E}\).

Lemma 9

Let \(\textbf{A}\), \(\textbf{B}\) be square matrices. If \(\textbf{A}\) is invertible and \(\left| \left| \textbf{A}^{-1}\textbf{B} \right| \right| _{\textrm{op}}<1\), then \(\textbf{A} +\textbf{B}\) is invertible and it holds:

$$\begin{aligned} \left| \left| (\textbf{A} + {\textbf{B}})^{-1} - \textbf{A}^{-1}\right| \right| _{\textrm{op}} \le \frac{\left| \left| {\textbf{A}}^{-1}\right| \right| _{\textrm{op}}^2 \left| \left| {\textbf{B}}\right| \right| _{\textrm{op}}}{1 - \left| \left| {\textbf{A}}^{-1}{\textbf{B}}\right| \right| _{\textrm{op}}}. \end{aligned}$$

B Concentration inequalities

You can find the proofs of the following bounds in Tropp (2012) and Gittens and Tropp (2011).

Theorem 5

(Matrix Chernoff bound) Let \({\textbf{Z}}_1, \dotsc , {\textbf{Z}}_n\) be independent random self-adjoint positive semi-definite matrices with dimension d, such that \(\sup _k \lambda _{\max }({\textbf{Z}}_k) \le R\) a.s. If we define:

$$\begin{aligned} \mu _{\min } {:}{=} \lambda _{\min }\!\left( \sum _{k=1}^{n} {\mathbb {E}}[{\textbf{Z}}_k] \right) , \end{aligned}$$

then we have:

$$\begin{aligned}{} & {} \forall \delta \in (0,1),\quad {\mathbb {P}}\left[ \lambda _{\min }\!\left( \sum _{k=1}^{n} {\textbf{Z}}_k \right) \le (1-\delta ) \mu _{\min } \right] \le d\times \left( \frac{\textrm{e}^{-\delta }}{(1-\delta )^{(1-\delta )}} \right) ^{\mu _{\min } / R}, \end{aligned}$$
(29)
$$\begin{aligned}{} & {} \forall \delta >0,\quad {\mathbb {P}}\left[ \lambda _{\min }\!\left( \sum _{k=1}^{n} {\textbf{Z}}_k \right) \ge (1+\delta ) \mu _{\min } \right] \le \left( \frac{\textrm{e}^{\delta }}{(1+\delta )^{(1+\delta )}} \right) ^{\mu _{\min } / R}. \end{aligned}$$
(30)

Theorem 6

(Matrix Bernstein bound) Let \({\textbf{Z}}_1, \dotsc , {\textbf{Z}}_n\) be independent random self-adjoint positive semi-definite matrices with dimension d, such that \({\mathbb {E}}[{\textbf{Z}}_k] = \textbf{0}\) and that \(\sup _k \lambda _{\max }({\textbf{Z}}_k) \le R\) a.s. If \(v>0\) is such that:

$$\begin{aligned} \left| \left| \sum _{k=1}^{n} {\mathbb {E}}\left[ {\textbf{Z}}_k^2\right] \right| \right| _\textrm{op}\le v, \end{aligned}$$

then for all \(x>0\) we have:

$$\begin{aligned} {\mathbb {P}}\left[ \lambda _{\max } \!\left( \sum _{k=1}^{n} {\textbf{Z}}_i \right) \ge x \right] \le d\times \exp \left( \frac{-x^2/2}{v + \frac{R}{3}x} \right) . \end{aligned}$$

C Combinatorics

Proposition 4

For \(n\ge 1\) and \(p\ge 2\) we have:

$$\begin{aligned} {{\,\textrm{Card}\,}}{\lbrace }{ {\varvec{m}}\in {\mathbb {N}}_+^p \,\vert \, m_1\cdots m_p \le n }{\rbrace } \le n\, H_n^{p-1}, \end{aligned}$$

where \(H_n {:}{=} \sum _{k=1}^{n} \frac{1}{k}\) is the n-th harmonic number.

Proof

We compute:

$$\begin{aligned} {{\,\textrm{Card}\,}}{\lbrace }{ {\varvec{m}}\in {\mathbb {N}}_+^p \,\vert \, D_{{\varvec{m}}} \le n }{\rbrace }&= \sum _{m_1=1}^{n} \cdots \sum _{m_p=1}^{n} \textbf{1}_{m_1\cdots m_p \le n} \\&= \sum _{m_1=1}^{n} \cdots \sum _{m_p=1}^{n} \textbf{1}_{m_p \le \frac{n}{m_1\cdots m_{p-1}}} \\&= \sum _{m_1=1}^{n} \cdots \sum _{m_{p-1}=1}^{n} \left\lceil {\frac{n}{m_1\cdots m_{p-1}}}\right\rceil \\&\le \sum _{m_1=1}^{n} \cdots \sum _{m_{p-1}=1}^{n} \frac{n}{m_1\cdots m_{p-1}} = n\, H_n^{p-1}. \end{aligned}$$

\(\square \)

Theorem 7

(Divisor bound) Let \(N\in {\mathbb {N}}_+\) and let \(\textrm{div}(N)\) be the set of divisors of N. We have for all \(\epsilon >0\):

$$\begin{aligned} {{\,\textrm{Card}\,}}\!\big (\textrm{div}(N)\big ) = {{\,\mathrm{\textrm{o}}\,}}(N^\epsilon ). \end{aligned}$$

As a consequence, we have for all \(\epsilon >0\):

$$\begin{aligned} {{\,\textrm{Card}\,}}{\lbrace }{ m\in {\mathbb {N}}_+^p \,\vert \, m_1\cdots m_p = N }{\rbrace } \le {{\,\textrm{Card}\,}}\!\big (\textrm{div}(N)\big )^p = {{\,\mathrm{\textrm{o}}\,}}(N^\epsilon ). \end{aligned}$$

A proof of this result can be found in Tao (2008).

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dussap, F. Nonparametric multiple regression by projection on non-compactly supported bases. Ann Inst Stat Math 75, 731–771 (2023). https://doi.org/10.1007/s10463-022-00863-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-022-00863-1

Keywords

Navigation