Nonparametric multiple regression by projection on non-compactly supported bases

Dussap, Florian

doi:10.1007/s10463-022-00863-1

Nonparametric multiple regression by projection on non-compactly supported bases

Published: 22 January 2023

Volume 75, pages 731–771, (2023)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Florian Dussap¹

135 Accesses
Explore all metrics

Abstract

We study the nonparametric regression estimation problem with a random design in ${\mathbb{R}}^{p}$ with $p\ge 2$. We do so by using a projection estimator obtained by least squares minimization. Our contribution is to consider non-compact estimation domains in ${\mathbb {R}}^{p}$, on which we recover the function, and to provide a theoretical study of the risk of the estimator relative to a norm weighted by the distribution of the design. We propose a model selection procedure in which the model collection is random and takes into account the discrepancy between the empirical norm and the norm associated with the distribution of design. We prove that the resulting estimator automatically optimizes the bias-variance trade-off in both norms, and we illustrate the numerical performance of our procedure on simulated data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Laguerre and Hermite bases for inverse problems

Article 27 March 2018

Regression function estimation as a partly inverse problem

Article 20 April 2019

Optimal Adaptive Estimation on $${\mathbb{R}}$$ or $${\mathbb{R}}^{{+}}$$ of the Derivatives of a Density

Article 01 January 2020

Notes

in general, it is a semi-norm but we will only consider subspaces on which it is a norm.

References

Arlot, S., Massart, P. (2009). Data-driven calibration of penalties for least-squares regression. Journal of Machine Learning Research, 10(10), 245–279.
Google Scholar
Baraud, Y. (2000). Model selection for regression on a fixed design. Probability Theory and Related Fields, 117(4), 467–493.
Article MathSciNet MATH Google Scholar
Baraud, Y. (2002). Model selection for regression on a random design. ESAIM: Probability and Statistics, 6, 127–146.
Article MathSciNet MATH Google Scholar
Barron, A., Birgé, L., Massart, P. (1999). Risk bounds for model selection via penalization. Probability Theory and Related Fields, 113(3), 301–413.
Article MathSciNet MATH Google Scholar
Birgé, L., Massart, P. (1998). Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli, 4(3), 329–375.
Article MathSciNet MATH Google Scholar
Cohen, A., Davenport, M. A., Leviatan, D. (2013). On the stability and accuracy of least squares approximations. Foundations of Computational Mathematics, 13(5), 819–834.
Article MathSciNet MATH Google Scholar
Comte, F., Genon-Catalot, V. (2018). Laguerre and Hermite bases for inverse problems. Journal of the Korean Statistical Society, 47(3), 273–296.
Article MathSciNet MATH Google Scholar
Comte, F., Genon-Catalot, V. (2020a). Regression function estimation as a partly inverse problem. Annals of the Institute of Statistical Mathematics, 72(4), 1023–1054.
Article MathSciNet MATH Google Scholar
Comte, F., Genon-Catalot, V. (2020b). Regression function estimation on non compact support in an Heteroscesdastic model. Metrika, 83(1), 93–128.
Article MathSciNet MATH Google Scholar
Comte, F., Marie, N. (2021). On a Nadaraya-Watson estimator with two bandwidths. Electronic Journal of Statistics, 15(1), 2566–2607.
Article MathSciNet MATH Google Scholar
Efromovich, S. (1999). Nonparametric curve estimation: Methods, theory and applications. Springer series in statistics, New York: Springer.
MATH Google Scholar
Gittens, A., Tropp, J.A. (2011) Tail bounds for all eigenvalues of a sum of random matrices. ArXiv:1104.4513 [math].
Györfi, L., Kohler, M., Krzyżak, A., Walk, H. (2002). A distribution-free theory of nonparametric regression. Springer series in statistics, New York, NY: Springer New York.
Book MATH Google Scholar
Härdle, W., Marron, J. S. (1985). Optimal bandwidth selection in nonparametric regression function estimation. The Annals of Statistics, 13(4), 1465–1481.
Article MathSciNet MATH Google Scholar
Köhler, M., Schindler, A., Sperlich, S. (2014). A review and comparison of bandwidth selection methods for Kernel regression: Review of bandwidth selection for regression. International Statistical Review, 82(2), 243–274.
Article MathSciNet MATH Google Scholar
Lacour, C., Massart, P., Rivoirard, V. (2017). Estimator selection: A new method with applications to Kernel density estimation. Sankhya A, 79(2), 298–335.
Article MathSciNet MATH Google Scholar
Mabon, G. (2017). Adaptive deconvolution on the non-negative real line: Adaptive deconvolution on R+. Scandinavian Journal of Statistics, 44(3), 707–740.
Article MathSciNet MATH Google Scholar
Nadaraya, E. A. (1964). On estimating regression. Theory of Probability & Its applications, 9(1), 141–142.
Article MATH Google Scholar
Sacko, O. (2020). Hermite density deconvolution. Latin American Journal of Probability and Mathematical Statistics, 17(1), 419–443.
Article MathSciNet MATH Google Scholar
Tao, T. (2008). The divisor bound. https://terrytao.wordpress.com/2008/09/23/the-divisor-bound.
Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4), 389–434.
Article MathSciNet MATH Google Scholar
Tsybakov, A. B. (2009). Introduction to nonparametric estimation. Series in statistics. London: Springer.
Book MATH Google Scholar
Watson, GS. (1964). Smooth regression analysis. Sankhyā: The Indian Journal of Statistics, Series A 26(4):359–372.
MathSciNet MATH Google Scholar

Download references

Acknowledgements

I want to thank Fabienne Comte and Céline Duval for their helpful advice and their support of my work. I also want to thank Florence Merlevède for her help with the second inequality of the Matrix Chernoff bound. Finally, I want to thank Herb Susmann for proofreading this article.

Author information

Authors and Affiliations

MAP5 Laboratory, Université Paris Cité, 45 rue des Saints-Pères, 75006, Paris, France
Florian Dussap

Authors

Florian Dussap
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Dussap.

Ethics declarations

Conflict of interest

The author declares that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by a grant from Région Île-de-France.

Appendices

A Linear algebra

Lemma 8

Let E be a Euclidean vector space and let $\ell :E\rightarrow {\mathbb {R}}^n$ be an injective linear map. For $y\in {\mathbb {R}}^n$, the solution of the problem:

$$\begin{aligned} {\hat{a}} {:}{=} {{\,\mathrm{arg\, min}\,}}_{a\in E} \,\left| \left| y - \ell (a)\right| \right| _{{\mathbb {R}}^n}^2 \end{aligned}$$

is given by:

$$\begin{aligned} {\hat{a}} = \left[ (\ell ^* \circ \ell )^{-1} \circ \ell ^*\right] (y), \end{aligned}$$

where $\ell ^*:{\mathbb {R}}^n\rightarrow E$ is characterized by the relation ${\langle }{ y, \ell (a) }{\rangle }_{{\mathbb {R}}^n} = {\langle }{ \ell ^*(y), a }{\rangle }_{E}$.

Lemma 9

Let $\textbf{A}$, $\textbf{B}$ be square matrices. If $\textbf{A}$ is invertible and $\left| \left| \textbf{A}^{-1}\textbf{B} \right| \right| _{\textrm{op}}<1$, then $\textbf{A} +\textbf{B}$ is invertible and it holds:

$$\begin{aligned} \left| \left| (\textbf{A} + {\textbf{B}})^{-1} - \textbf{A}^{-1}\right| \right| _{\textrm{op}} \le \frac{\left| \left| {\textbf{A}}^{-1}\right| \right| _{\textrm{op}}^2 \left| \left| {\textbf{B}}\right| \right| _{\textrm{op}}}{1 - \left| \left| {\textbf{A}}^{-1}{\textbf{B}}\right| \right| _{\textrm{op}}}. \end{aligned}$$

B Concentration inequalities

You can find the proofs of the following bounds in Tropp (2012) and Gittens and Tropp (2011).

Theorem 5

(Matrix Chernoff bound) Let ${\textbf{Z}}_1, \dotsc , {\textbf{Z}}_n$ be independent random self-adjoint positive semi-definite matrices with dimension d, such that $\sup _k \lambda _{\max }({\textbf{Z}}_k) \le R$ a.s. If we define:

$$\begin{aligned} \mu _{\min } {:}{=} \lambda _{\min }\!\left( \sum _{k=1}^{n} {\mathbb {E}}[{\textbf{Z}}_k] \right) , \end{aligned}$$

then we have:

$$\begin{aligned}{} & {} \forall \delta \in (0,1),\quad {\mathbb {P}}\left[ \lambda _{\min }\!\left( \sum _{k=1}^{n} {\textbf{Z}}_k \right) \le (1-\delta ) \mu _{\min } \right] \le d\times \left( \frac{\textrm{e}^{-\delta }}{(1-\delta )^{(1-\delta )}} \right) ^{\mu _{\min } / R}, \end{aligned}$$

(29)

$$\begin{aligned}{} & {} \forall \delta >0,\quad {\mathbb {P}}\left[ \lambda _{\min }\!\left( \sum _{k=1}^{n} {\textbf{Z}}_k \right) \ge (1+\delta ) \mu _{\min } \right] \le \left( \frac{\textrm{e}^{\delta }}{(1+\delta )^{(1+\delta )}} \right) ^{\mu _{\min } / R}. \end{aligned}$$

(30)

Theorem 6

(Matrix Bernstein bound) Let ${\textbf{Z}}_1, \dotsc , {\textbf{Z}}_n$ be independent random self-adjoint positive semi-definite matrices with dimension d, such that ${\mathbb {E}}[{\textbf{Z}}_k] = \textbf{0}$ and that $\sup _k \lambda _{\max }({\textbf{Z}}_k) \le R$ a.s. If $v>0$ is such that:

$$\begin{aligned} \left| \left| \sum _{k=1}^{n} {\mathbb {E}}\left[ {\textbf{Z}}_k^2\right] \right| \right| _\textrm{op}\le v, \end{aligned}$$

then for all $x>0$ we have:

$$\begin{aligned} {\mathbb {P}}\left[ \lambda _{\max } \!\left( \sum _{k=1}^{n} {\textbf{Z}}_i \right) \ge x \right] \le d\times \exp \left( \frac{-x^2/2}{v + \frac{R}{3}x} \right) . \end{aligned}$$

C Combinatorics

Proposition 4

For $n\ge 1$ and $p\ge 2$ we have:

$$\begin{aligned} {{\,\textrm{Card}\,}}{\lbrace }{ {\varvec{m}}\in {\mathbb {N}}_+^p \,\vert \, m_1\cdots m_p \le n }{\rbrace } \le n\, H_n^{p-1}, \end{aligned}$$

where $H_n {:}{=} \sum _{k=1}^{n} \frac{1}{k}$ is the n-th harmonic number.

Proof

We compute:

$$\begin{aligned} {{\,\textrm{Card}\,}}{\lbrace }{ {\varvec{m}}\in {\mathbb {N}}_+^p \,\vert \, D_{{\varvec{m}}} \le n }{\rbrace }&= \sum _{m_1=1}^{n} \cdots \sum _{m_p=1}^{n} \textbf{1}_{m_1\cdots m_p \le n} \\&= \sum _{m_1=1}^{n} \cdots \sum _{m_p=1}^{n} \textbf{1}_{m_p \le \frac{n}{m_1\cdots m_{p-1}}} \\&= \sum _{m_1=1}^{n} \cdots \sum _{m_{p-1}=1}^{n} \left\lceil {\frac{n}{m_1\cdots m_{p-1}}}\right\rceil \\&\le \sum _{m_1=1}^{n} \cdots \sum _{m_{p-1}=1}^{n} \frac{n}{m_1\cdots m_{p-1}} = n\, H_n^{p-1}. \end{aligned}$$

$\square $

Theorem 7

(Divisor bound) Let $N\in {\mathbb {N}}_+$ and let $\textrm{div}(N)$ be the set of divisors of N. We have for all $\epsilon >0$:

$$\begin{aligned} {{\,\textrm{Card}\,}}\!\big (\textrm{div}(N)\big ) = {{\,\mathrm{\textrm{o}}\,}}(N^\epsilon ). \end{aligned}$$

As a consequence, we have for all $\epsilon >0$:

$$\begin{aligned} {{\,\textrm{Card}\,}}{\lbrace }{ m\in {\mathbb {N}}_+^p \,\vert \, m_1\cdots m_p = N }{\rbrace } \le {{\,\textrm{Card}\,}}\!\big (\textrm{div}(N)\big )^p = {{\,\mathrm{\textrm{o}}\,}}(N^\epsilon ). \end{aligned}$$

A proof of this result can be found in Tao (2008).

About this article

Cite this article

Dussap, F. Nonparametric multiple regression by projection on non-compactly supported bases. Ann Inst Stat Math 75, 731–771 (2023). https://doi.org/10.1007/s10463-022-00863-1

Download citation

Received: 03 January 2022
Revised: 24 October 2022
Accepted: 22 November 2022
Published: 22 January 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10463-022-00863-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonparametric multiple regression by projection on non-compactly supported bases

Abstract

Access this article

Similar content being viewed by others

Laguerre and Hermite bases for inverse problems

Regression function estimation as a partly inverse problem

Optimal Adaptive Estimation on $${\mathbb{R}}$$ or $${\mathbb{R}}^{{+}}$$ of the Derivatives of a Density

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

A Linear algebra

Lemma 8

Lemma 9

B Concentration inequalities

Theorem 5

Theorem 6

C Combinatorics

Proposition 4

Proof

Theorem 7

About this article

Cite this article

Keywords

Navigation

Nonparametric multiple regression by projection on non-compactly supported bases

Abstract

Access this article

Similar content being viewed by others

Laguerre and Hermite bases for inverse problems

Regression function estimation as a partly inverse problem

Optimal Adaptive Estimation on $${\mathbb{R}}$$ or $${\mathbb{R}}^{{+}}$$ of the Derivatives of a Density

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

A Linear algebra

Lemma 8

Lemma 9

B Concentration inequalities

Theorem 5

Theorem 6

C Combinatorics

Proposition 4

Proof

Theorem 7

About this article

Cite this article

Share this article

Keywords

Search

Navigation