Local Linear Smoothing in Additive Models as Data Projection

Hiabu, Munir; Mammen, Enno; Meyer, Joseph T.

doi:10.1007/978-3-031-30114-8_5

Munir Hiabu⁷,
Enno Mammen⁸ &
Joseph T. Meyer⁸

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 425))

Included in the following conference series:

Foundations of Modern Statistics

353 Accesses

Abstract

We discuss local linear smooth backfitting for additive nonparametric models. This procedure is well known for achieving optimal convergence rates under appropriate smoothness conditions. In particular, it allows for the estimation of each component of an additive model with the same asymptotic accuracy as if the other components were known. The asymptotic discussion of local linear smooth backfitting is rather complex because typically an overwhelming notation is required for a detailed discussion. In this paper we interpret the local linear smooth backfitting estimator as a projection of the data onto a linear space with a suitably chosen semi-norm. This approach simplifies both the mathematical discussion as well as the intuitive understanding of properties of this version of smooth backfitting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bickel, P.J., Klaassen, C.A., Bickel, P.J., Ritov, Y., Klaassen, J., Wellner, J.A., Ritov, Y.: Efficient and Adaptive Estimation for Semiparametric Models. John Hopkins University Press, Baltimore (1993)
MATH Google Scholar
Friedman, J.H., Stuetzle, W.: Projection pursuit regression. J. Am. Stat. Assoc. 76, 817–823 (1981)
Article MathSciNet Google Scholar
Gregory, K., Mammen, E., Wahl, M.: Optimal estimation of sparse high-dimensional additive models. Ann. Stat. (2020)
Google Scholar
Han, K., Müller, H.-G., Park, B.U.: Additive functional regression for densities as responses. J. Am. Stat. Assoc. 115, 997–1010 (2020)
Article MathSciNet MATH Google Scholar
Han, K., Park, B.U., et al.: Smooth backfitting for errors-in-variables additive models. Ann. Stat. 46, 2216–2250 (2018)
Article MathSciNet MATH Google Scholar
Härdle, W., Sperlich, S., Spokoiny, V.: Structural tests in additive regression. J. Am. Stat. Assoc. 96, 1333–1347 (2001)
Article MathSciNet MATH Google Scholar
Hiabu, M., Mammen, E., Martínez-Miranda, M.D., Nielsen, J.P.: Smooth backfitting of proportional hazards with multiplicative components. J. Am. Stat. Assoc. (2020)
Google Scholar
Jeon, J.M., Park, B.U., et al.: Additive regression with Hilbertian responses. Ann. Stat. 48, 2671–2697 (2020)
Article MathSciNet MATH Google Scholar
Kato, T.: Perturbation Theory for Linear Operators. Springer Science & Business Media (2013)
Google Scholar
Kober, H.: A theorem on Banach spaces. Compos. Math. 7, 135–140 (1940)
MATH Google Scholar
Mammen, E., Linton, O., Nielsen, J.: The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Ann. Stat. 27, 1443–1490 (1999)
Article MathSciNet MATH Google Scholar
Mammen, E., Marron, J., Turlach, B., Wand, M., et al.: A general projection framework for constrained smoothing. Stat. Sci. 16, 232–248 (2001)
Article MathSciNet MATH Google Scholar
Mammen, E., Nielsen, J.P.: Generalised structured models. Biometrika 90, 551–566 (2003)
Article MathSciNet MATH Google Scholar
Mammen, E., Park, B.U., Schienle, M.: Additive models: extensions and related models. In: Racine, J.S., Su, L., Ullah, A. (eds.) The Oxford Handbook of Applied Nonparametric and Semiparametric Econometrics and Statistics. Oxford Univ, Press (2014)
Google Scholar
Mammen, E., Sperlich, S.: Additivity tests based on smooth backfitting. Biometrika (2021)
Google Scholar
Mammen, E., Yu, K.: Nonparametric estimation of noisy integral equations of the second kind. J. Korean Stat. Soc. 38, 99–110 (2009)
Article MathSciNet MATH Google Scholar
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Routledge (2018)
Google Scholar
Stone, C.J.: Optimal global rates of convergence for nonparametric regression. Ann. Stat. 10, 1040–1053 (1982)
Article MathSciNet MATH Google Scholar
Yu, K., Park, B.U., Mammen, E., et al.: Smooth backfitting in generalized additive models. Ann. Stat. 36, 228–260 (2008)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematical Sciences, University of Copenhagen, Universitetsparken 5, 2100, Copenhagen O, Denmark
Munir Hiabu
Heidelberg University, Institute for Applied Mathematics, INF 205, 69120, Heidelberg, Germany
Enno Mammen & Joseph T. Meyer

Authors

Munir Hiabu
View author publications
You can also search for this author in PubMed Google Scholar
Enno Mammen
View author publications
You can also search for this author in PubMed Google Scholar
Joseph T. Meyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Enno Mammen .

Editor information

Editors and Affiliations

Faculty of Mathematics, University of Duisburg-Essen, Essen, Germany
Denis Belomestny
Institut Polytechnique de Paris, CREST, ENSAE, Palaiseau, France
Cristina Butucea
Institute for Applied Mathematics, Heidelberg University, Heidelberg, Baden-Württemberg, Germany
Enno Mammen
CMAP, Ecole Polytechnique, Palaiseau, France
Eric Moulines
Institute of Mathematics, Humboldt-Universität zu Berlin, Berlin, Germany
Markus Reiß
Faculty of Computer Science, HSE University and Moscow State University, Moscow, Russia
Vladimir V. Ulyanov

Appendices

Appendix 1: Projection Operators

In this section we will state expressions for the projection operators $\mathcal P_0$, $ \mathcal P_k$, $P_k$ and $\mathcal P_{k'}$ ($1 \le k \le d$) mapping elements of $ \mathcal H$ to $\mathcal H_0$, $\mathcal H_k$, $\mathcal H_k + \mathcal H_0 $ and $\mathcal H_{k'} $, respectively, see Sect. 2. For an element $f = (f^{i,j})_ {i=1,\dots ,n;\ j=0,\dots ,d}$ the operators $ \mathcal P_0$, $ \mathcal P_k$, and $ P_k$ ($1 \le k \le d$) set all components to zero but the components with indices $(i,0), i=1,\dots ,n$. Furthermore, in the case $d < k \le 2d$ only the components with index $(i,k-d), i=1,\dots ,n$ are non-zero. Thus, for the definition of the operators it remains to set

$$\begin{aligned} (\mathcal P_0(f) ) ^{i,0}(x)= & {} \frac{1}{n} {\sum _{i=1}^n\int _{\mathcal X}\{f^{i,0}(u)+\sum _{j=1}^d f^{i,j}(x)(X_{ij}-u_j)\} K^{X_i}_h(X_i-u)\mathrm du}. \end{aligned}$$

For $1 \le k \le d$ it suffices to define $(\mathcal P_k(f) ) ^{i,0}(x) = (P_k(f) ) ^{i,0}(x)- (\mathcal P_0(f) ) ^{i,0}$ and

$$\begin{aligned}{} & {} (P_k(f) ) ^{i,0}(x) =\frac{1}{\hat{p}_k(x_k)} \bigg [\frac{1}{n} \sum _{i=1}^n\int _{u \in \mathcal X_{-k}(x_k) } \bigg \{f^{i,0}(u)+\sum _{j=1}^d f^{i,j}(u)(X_{ij}-u_j)\bigg \} \\{} & {} \qquad \times K^{X_i}_h(X_i-u)\mathrm du_{-k} \bigg ],\end{aligned}$$

$$\begin{aligned}{} & {} (P_{k'}(f) ) ^{i,0}(x) =\frac{1}{\hat{p}^{**}_k(x_k)} \bigg [\frac{1}{n} \sum _{i=1}^n\int _{u \in \mathcal X_{-k}(x_k) } \bigg \{f^{i,0}(u)+\sum _{j=1}^d f^{i,j}(u)(X_{ij}-u_j) \bigg \} \\{} & {} \qquad \times (X_{ik}-x_k) K^{X_i}_h(X_i-u)\mathrm du_{-k} \bigg ]. \end{aligned}$$

For the orthogonal projections of functions $m \in \mathcal H_{add} $ one can use simplified formulas. In particular, these formulas can be used in our algorithm for updating functions $m \in \mathcal H_{add} $. If $m \in \mathcal H_{add} $ has components $m_0,\dots ,m_d, m^{(1)}_1,\dots ,m^{(1)}_d$ the operators $P_k $ and $P_{k'}$ are defined as follows

$$\begin{aligned}{} & {} (\mathcal P_0(m)(x))) ^{i,0} = m_0 + \sum _{j=1}^d \int _{\mathcal X_j} m_j^{(1)}(u_j) \widehat{p}_j^*(u_j) \mathrm d u_j, \\{} & {} (P_k(m)(x)) ^{i,0} =m_0 + m_k(x_k) + m_k^{(1)}(x_k)\frac{ \hat{p}^{*}_k(x_k) }{\hat{p}_k(x_k) } \\ {}{} & {} \qquad +\sum _{1 \le j \le d, j \not = k} \int _{\mathcal X_{-k,j}(x_k) } \bigg [m_j(u_j) \frac{ \hat{p}_{jk}(u_j,x_k)}{\hat{p}_k(x_k) } + m_j^{(1)}(u_j) \frac{ \hat{p}^{*}_{jk}(u_j,x_k)}{\hat{p}_k(x_k) } \bigg ]\mathrm du_{j},\\{} & {} (\mathcal P_k(m)(x)) ^{i,0} = m_k(x_k) + m_k^{(1)}(x_k)\frac{ \hat{p}^{*}_k(x_k) }{\hat{p}_k(x_k) } - \sum _{1 \le j \le d} \int _{\mathcal X_{j} } m_j^{(1)}(u_j) \hat{p}^{*}_{j}(u_j) \mathrm du_{j} \\{} & {} \qquad +\sum _{1 \le j \le d, j \not = k} \int _{\mathcal X_{-k,j}(x_k) } \bigg [m_j(u_j) \frac{ \hat{p}_{jk}(u_j,x_k)}{\hat{p}_k(x_k) } + m_j^{(1)}(u_j) \frac{ \hat{p}^{*}_{jk}(u_j,x_k)}{\hat{p}_k(x_k) }\bigg ] \mathrm du_{j},\\{} & {} (P_{k'}(m)(x)) ^{i,0} = m^{(1)}_k(x_k) + (m_0+ m_k(x_k))\frac{ \hat{p}^{*}_k(x_k) }{\hat{p}^{**}_k(x_k) } \\{} & {} \qquad +\sum _{1 \le j \le d, j \not = k} \int _{\mathcal X_{-k,j}(x_k) } \bigg [m_j(u_j) \frac{ \hat{p}^{*}_{kj}(x_k, u_j)}{\hat{p}^{**}_k(x_k) } + m_j^{(1)}(u_j) \frac{ \hat{p}^{**}_{jk}(u_j,x_k)}{\hat{p}^{**}_k(x_k) }\bigg ] \mathrm du_{j} , \end{aligned}$$

where for $1\le j,k \le d$ with $k \not =j$

$$\begin{aligned} \hat{p}_{jk}(x_j,x_k)= & {} \frac{1}{n} \sum _{i=1} ^n \int _{\mathcal X_{-(jk)}(x_j,x_k) } K^{X_i}_h(X_i-x)\mathrm dx_{-(jk)},\\ \hat{p}^{*}_{jk}(x_j,x_k)= & {} \frac{1}{n} \sum _{i=1} ^n \int _{\mathcal X_{-(jk)}(x_j,x_k) } (X_{ij}-u_j) K^{X_i}_h(X_i-x)\mathrm dx_{-(jk)},\\ \hat{p}^{**}_{jk}(x_j,x_k)= & {} \frac{1}{n} \sum _{i=1} ^n \int _{\mathcal X_{-(jk)}(x_j,x_k) }(X_{ij}-u_j)(X_{ik}-x_k) K^{X_i}_h(X_i-x)\mathrm dx_{-(jk)} \end{aligned}$$

with $\mathcal X_{-(jk)}(x_j,x_k) =\{ u \in \mathcal X : u_k=x_k, u_j=x_j\}$ and $\mathcal X_{-k,j}(x_k)=\{ u\in \mathcal X _j:$ there exists $v \in \mathcal X$ with $v_k=x_k$ and $v_j = u\}$ and $u_{-(jk)} $ denoting the vector $(u_l: l \in \{1,\dots ,d\}\backslash \{j,k\} )$.

Appendix 2: Proofs of Propositions 1 and 2

In this section we will give proofs for Propositions 1 and 2. They were used in Sect. 3 for the discussion of the existence of the smooth backfitting estimator as well as the convergence of an algorithm for its calculation.

Proof

(of Proposition 1 )

$\mathbf {{(ii) \Rightarrow (i)}}.$ Let $g^{(n)}\in L$ be a Cauchy sequence. We must show $\lim _{n \rightarrow \infty } g^{(n)} \in L$. By definition of L there exist sequences $g_1^{(n)}\in L_1$ and $g_2^{(n)}\in L_2$ such that $g^{(n)}=g_1^{(n)}+g_2^{(n)}$. With (8), for $i=1,2$ we obtain

$$ \left\Vert g_i^{(n)}-g_i^{(m)}\right\Vert \le \frac{1}{c}\left\Vert g^{(n)}-g^{(m)}\right\Vert \rightarrow 0. $$

Hence, $g_1^{(n)}$ and $g_2^{(n)}$ are Cauchy sequences. Since $L_1$ and $L_2$ are closed their limits are elements of $L_1\subseteq L$ and $L_2\subseteq L$, respectively. Thus,

$$ \lim _{n \rightarrow \infty } g^{(n)}= \lim _{n \rightarrow \infty } g_1^{(n)}+g_2^{(n)} \in L. $$

$\mathbf {{(i) \Rightarrow (iii)}}.$ We write $\Pi _1(L_2)=\Pi _1$. Since L is closed, it is a Banach space. Using the closed graph theorem, it suffices to show the following: If $g^{(n)}\in L$ and $\Pi _1 g^{(n)}\in L_1$ are converging sequences with limits $g, g_1$, then $\Pi _1g=g_1$.

Let $g^{(n)}\in L$ and $\Pi _1 g^{(n)}\in L_1$ be sequences with limits g and $g_1$, respectively. Write $g^{(n)}=g_1^{(n)}+g_2^{(n)}$. Since

$$ \left\Vert g_2^{(n)}-g_2^{(m)}\right\Vert \le \left\Vert g_1^{(n)}-g_1^{(m)}\right\Vert +\left\Vert g^{(n)}-g^{(m)}\right\Vert $$

$g_2^{(n)}$ is a Cauchy sequence converging to a limit $g_2\in L_2$. We conclude $g=g_1+g_2$, meaning $\Pi _1g=g_1$.

$\mathbf {{(iii) \Rightarrow (ii)}}.$ If $\Pi _1$ is a bounded operator, then so is $\Pi _2$, since $\left\Vert g_2\right\Vert \le \left\Vert g\right\Vert +\left\Vert g_1\right\Vert $. Denote the corresponding operator norms by $C_1$ and $C_2$, respectively. Then

$$\max \{\left\Vert g_1\right\Vert ,\left\Vert g_2\right\Vert \}\le \max \{C_1,C_2\}\left\Vert g\right\Vert $$

which concludes the proof by choosing $c=\frac{1}{\max \{C_1,C_2\}}$.

$\mathbf {{(iii) \Leftrightarrow (iv)}}.$ This follows from

$$ \left\Vert \Pi _1\right\Vert = \sup _{g\in L} \frac{\left\Vert g_1\right\Vert }{\left\Vert g\right\Vert }=\sup _{g_1\in L_1, g_2 \in L_2} \frac{\left\Vert g_1\right\Vert }{\left\Vert g_1+g_2\right\Vert }= \sup _{g_1\in L_1} \frac{\left\Vert g_1\right\Vert }{\textrm{dist}(g_1,L_2)}=\frac{1}{\gamma (L_1,L_2)}. $$

Lemma 9

Let $L_1,L_2$ be closed subspaces of a Hilbert space. For $\gamma $ defined as in Proposition 1 we have

$$\begin{aligned} \gamma (L_1,L_2)^2 = 1-\left\Vert \mathcal {P}_2\mathcal {P}_1\right\Vert ^2. \end{aligned}$$

Proof

$$\begin{aligned} \gamma (L_1,L_2)^2&=\inf _{g_1\in L_1,\left\Vert g_1\right\Vert =1}\left\Vert g_1-\mathcal {P}_2g_1\right\Vert \\&=\inf _{g_1\in L_1,\left\Vert g_1\right\Vert =1}\langle g_1-\mathcal {P}_2g_1,g_1-\mathcal {P}_2g_1\rangle \\&=\inf _{g_1\in L_1,\left\Vert g_1\right\Vert =1}\langle g_1,g_1\rangle -\langle \mathcal {P}_2g_1,\mathcal {P}_2g_1\rangle \\&=1-\sup _{g_1\in L_1,\left\Vert g_1\right\Vert =1}\langle \mathcal {P}_2g_1,\mathcal {P}_2g_1\rangle \\&=1-\sup _{g\in L,\left\Vert g\right\Vert =1}\langle \mathcal {P}_2\mathcal {P}_1g,\mathcal {P}_2\mathcal {P}_1g\rangle \\&=1-\left\Vert \mathcal {P}_2\mathcal {P}_1\right\Vert ^2. \end{aligned}$$

Proof (of Proposition 2)

Let $\mathcal P_j$ be the orthogonal projection onto $L_j$. Following Lemma 9 we have

$$\begin{aligned} 1-\left\Vert {\mathcal P_2 \mathcal P_1}\right\Vert ^2=\gamma (L_1,L_2)^2. \end{aligned}$$

Using Proposition 1, proving $\left\Vert \mathcal P_2 \mathcal P_1\right\Vert <1$ implies that L is closed. Observe that $\left\Vert \mathcal P_2 \mathcal P_1\right\Vert \le 1$ because for $g \in L$

$$ \left\Vert \mathcal P_j g\right\Vert ^2= \langle \mathcal P_j g,\mathcal P_j g\rangle =\langle g,\mathcal P_j g\rangle \le \left\Vert g\right\Vert \left\Vert \mathcal P_j g\right\Vert , $$

which yields $\left\Vert \mathcal P_i\right\Vert \le 1$ for $i=1,2$. To show the strict inequality, note that if $\mathcal P_2{_{\big | L_1}}$ is compact, so is $\left\Vert {\mathcal P_2 \mathcal P_1}\right\Vert $ since the composition of two operators is compact if at least one is compact.

Thus, for every $\varepsilon >0$, $\mathcal P_2 \mathcal P_1$ has at most a finite number of eigenvalues greater than $\varepsilon $. Since 1 is clearly not an eigenvalue, we conclude $\left\Vert {\mathcal P_1 \mathcal P_2}\right\Vert <1$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hiabu, M., Mammen, E., Meyer, J.T. (2023). Local Linear Smoothing in Additive Models as Data Projection. In: Belomestny, D., Butucea, C., Mammen, E., Moulines, E., Reiß, M., Ulyanov, V.V. (eds) Foundations of Modern Statistics. FMS 2019. Springer Proceedings in Mathematics & Statistics, vol 425. Springer, Cham. https://doi.org/10.1007/978-3-031-30114-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-30114-8_5
Published: 17 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30113-1
Online ISBN: 978-3-031-30114-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Local Linear Smoothing in Additive Models as Data Projection

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix 1: Projection Operators

Appendix 2: Proofs of Propositions 1 and 2

Proof

Lemma 9

Proof

Proof (of Proposition 2)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation