Skip to main content

Local Linear Smoothing in Additive Models as Data Projection

  • Conference paper
  • First Online:
Foundations of Modern Statistics (FMS 2019)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 425))

Included in the following conference series:

  • 353 Accesses

Abstract

We discuss local linear smooth backfitting for additive nonparametric models. This procedure is well known for achieving optimal convergence rates under appropriate smoothness conditions. In particular, it allows for the estimation of each component of an additive model with the same asymptotic accuracy as if the other components were known. The asymptotic discussion of local linear smooth backfitting is rather complex because typically an overwhelming notation is required for a detailed discussion. In this paper we interpret the local linear smooth backfitting estimator as a projection of the data onto a linear space with a suitably chosen semi-norm. This approach simplifies both the mathematical discussion as well as the intuitive understanding of properties of this version of smooth backfitting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bickel, P.J., Klaassen, C.A., Bickel, P.J., Ritov, Y., Klaassen, J., Wellner, J.A., Ritov, Y.: Efficient and Adaptive Estimation for Semiparametric Models. John Hopkins University Press, Baltimore (1993)

    MATH  Google Scholar 

  2. Friedman, J.H., Stuetzle, W.: Projection pursuit regression. J. Am. Stat. Assoc. 76, 817–823 (1981)

    Article  MathSciNet  Google Scholar 

  3. Gregory, K., Mammen, E., Wahl, M.: Optimal estimation of sparse high-dimensional additive models. Ann. Stat. (2020)

    Google Scholar 

  4. Han, K., Müller, H.-G., Park, B.U.: Additive functional regression for densities as responses. J. Am. Stat. Assoc. 115, 997–1010 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  5. Han, K., Park, B.U., et al.: Smooth backfitting for errors-in-variables additive models. Ann. Stat. 46, 2216–2250 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  6. Härdle, W., Sperlich, S., Spokoiny, V.: Structural tests in additive regression. J. Am. Stat. Assoc. 96, 1333–1347 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  7. Hiabu, M., Mammen, E., Martínez-Miranda, M.D., Nielsen, J.P.: Smooth backfitting of proportional hazards with multiplicative components. J. Am. Stat. Assoc. (2020)

    Google Scholar 

  8. Jeon, J.M., Park, B.U., et al.: Additive regression with Hilbertian responses. Ann. Stat. 48, 2671–2697 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  9. Kato, T.: Perturbation Theory for Linear Operators. Springer Science & Business Media (2013)

    Google Scholar 

  10. Kober, H.: A theorem on Banach spaces. Compos. Math. 7, 135–140 (1940)

    MATH  Google Scholar 

  11. Mammen, E., Linton, O., Nielsen, J.: The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Ann. Stat. 27, 1443–1490 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  12. Mammen, E., Marron, J., Turlach, B., Wand, M., et al.: A general projection framework for constrained smoothing. Stat. Sci. 16, 232–248 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  13. Mammen, E., Nielsen, J.P.: Generalised structured models. Biometrika 90, 551–566 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. Mammen, E., Park, B.U., Schienle, M.: Additive models: extensions and related models. In: Racine, J.S., Su, L., Ullah, A. (eds.) The Oxford Handbook of Applied Nonparametric and Semiparametric Econometrics and Statistics. Oxford Univ, Press (2014)

    Google Scholar 

  15. Mammen, E., Sperlich, S.: Additivity tests based on smooth backfitting. Biometrika (2021)

    Google Scholar 

  16. Mammen, E., Yu, K.: Nonparametric estimation of noisy integral equations of the second kind. J. Korean Stat. Soc. 38, 99–110 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  17. Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Routledge (2018)

    Google Scholar 

  18. Stone, C.J.: Optimal global rates of convergence for nonparametric regression. Ann. Stat. 10, 1040–1053 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  19. Yu, K., Park, B.U., Mammen, E., et al.: Smooth backfitting in generalized additive models. Ann. Stat. 36, 228–260 (2008)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enno Mammen .

Editor information

Editors and Affiliations

Appendices

Appendix 1: Projection Operators

In this section we will state expressions for the projection operators \(\mathcal P_0\), \( \mathcal P_k\), \(P_k\) and \(\mathcal P_{k'}\) (\(1 \le k \le d\)) mapping elements of \( \mathcal H\) to \(\mathcal H_0\), \(\mathcal H_k\), \(\mathcal H_k + \mathcal H_0 \) and \(\mathcal H_{k'} \), respectively, see Sect. 2. For an element \(f = (f^{i,j})_ {i=1,\dots ,n;\ j=0,\dots ,d}\) the operators \( \mathcal P_0\), \( \mathcal P_k\), and \( P_k\) (\(1 \le k \le d\)) set all components to zero but the components with indices \((i,0), i=1,\dots ,n\). Furthermore, in the case \(d < k \le 2d\) only the components with index \((i,k-d), i=1,\dots ,n\) are non-zero. Thus, for the definition of the operators it remains to set

$$\begin{aligned} (\mathcal P_0(f) ) ^{i,0}(x)= & {} \frac{1}{n} {\sum _{i=1}^n\int _{\mathcal X}\{f^{i,0}(u)+\sum _{j=1}^d f^{i,j}(x)(X_{ij}-u_j)\} K^{X_i}_h(X_i-u)\mathrm du}. \end{aligned}$$

For \(1 \le k \le d\) it suffices to define \((\mathcal P_k(f) ) ^{i,0}(x) = (P_k(f) ) ^{i,0}(x)- (\mathcal P_0(f) ) ^{i,0}\) and

$$\begin{aligned}{} & {} (P_k(f) ) ^{i,0}(x) =\frac{1}{\hat{p}_k(x_k)} \bigg [\frac{1}{n} \sum _{i=1}^n\int _{u \in \mathcal X_{-k}(x_k) } \bigg \{f^{i,0}(u)+\sum _{j=1}^d f^{i,j}(u)(X_{ij}-u_j)\bigg \} \\{} & {} \qquad \times K^{X_i}_h(X_i-u)\mathrm du_{-k} \bigg ],\end{aligned}$$
$$\begin{aligned}{} & {} (P_{k'}(f) ) ^{i,0}(x) =\frac{1}{\hat{p}^{**}_k(x_k)} \bigg [\frac{1}{n} \sum _{i=1}^n\int _{u \in \mathcal X_{-k}(x_k) } \bigg \{f^{i,0}(u)+\sum _{j=1}^d f^{i,j}(u)(X_{ij}-u_j) \bigg \} \\{} & {} \qquad \times (X_{ik}-x_k) K^{X_i}_h(X_i-u)\mathrm du_{-k} \bigg ]. \end{aligned}$$

For the orthogonal projections of functions \(m \in \mathcal H_{add} \) one can use simplified formulas. In particular, these formulas can be used in our algorithm for updating functions \(m \in \mathcal H_{add} \). If \(m \in \mathcal H_{add} \) has components \(m_0,\dots ,m_d, m^{(1)}_1,\dots ,m^{(1)}_d\) the operators \(P_k \) and \(P_{k'}\) are defined as follows

$$\begin{aligned}{} & {} (\mathcal P_0(m)(x))) ^{i,0} = m_0 + \sum _{j=1}^d \int _{\mathcal X_j} m_j^{(1)}(u_j) \widehat{p}_j^*(u_j) \mathrm d u_j, \\{} & {} (P_k(m)(x)) ^{i,0} =m_0 + m_k(x_k) + m_k^{(1)}(x_k)\frac{ \hat{p}^{*}_k(x_k) }{\hat{p}_k(x_k) } \\ {}{} & {} \qquad +\sum _{1 \le j \le d, j \not = k} \int _{\mathcal X_{-k,j}(x_k) } \bigg [m_j(u_j) \frac{ \hat{p}_{jk}(u_j,x_k)}{\hat{p}_k(x_k) } + m_j^{(1)}(u_j) \frac{ \hat{p}^{*}_{jk}(u_j,x_k)}{\hat{p}_k(x_k) } \bigg ]\mathrm du_{j},\\{} & {} (\mathcal P_k(m)(x)) ^{i,0} = m_k(x_k) + m_k^{(1)}(x_k)\frac{ \hat{p}^{*}_k(x_k) }{\hat{p}_k(x_k) } - \sum _{1 \le j \le d} \int _{\mathcal X_{j} } m_j^{(1)}(u_j) \hat{p}^{*}_{j}(u_j) \mathrm du_{j} \\{} & {} \qquad +\sum _{1 \le j \le d, j \not = k} \int _{\mathcal X_{-k,j}(x_k) } \bigg [m_j(u_j) \frac{ \hat{p}_{jk}(u_j,x_k)}{\hat{p}_k(x_k) } + m_j^{(1)}(u_j) \frac{ \hat{p}^{*}_{jk}(u_j,x_k)}{\hat{p}_k(x_k) }\bigg ] \mathrm du_{j},\\{} & {} (P_{k'}(m)(x)) ^{i,0} = m^{(1)}_k(x_k) + (m_0+ m_k(x_k))\frac{ \hat{p}^{*}_k(x_k) }{\hat{p}^{**}_k(x_k) } \\{} & {} \qquad +\sum _{1 \le j \le d, j \not = k} \int _{\mathcal X_{-k,j}(x_k) } \bigg [m_j(u_j) \frac{ \hat{p}^{*}_{kj}(x_k, u_j)}{\hat{p}^{**}_k(x_k) } + m_j^{(1)}(u_j) \frac{ \hat{p}^{**}_{jk}(u_j,x_k)}{\hat{p}^{**}_k(x_k) }\bigg ] \mathrm du_{j} , \end{aligned}$$

where for \(1\le j,k \le d\) with \(k \not =j\)

$$\begin{aligned} \hat{p}_{jk}(x_j,x_k)= & {} \frac{1}{n} \sum _{i=1} ^n \int _{\mathcal X_{-(jk)}(x_j,x_k) } K^{X_i}_h(X_i-x)\mathrm dx_{-(jk)},\\ \hat{p}^{*}_{jk}(x_j,x_k)= & {} \frac{1}{n} \sum _{i=1} ^n \int _{\mathcal X_{-(jk)}(x_j,x_k) } (X_{ij}-u_j) K^{X_i}_h(X_i-x)\mathrm dx_{-(jk)},\\ \hat{p}^{**}_{jk}(x_j,x_k)= & {} \frac{1}{n} \sum _{i=1} ^n \int _{\mathcal X_{-(jk)}(x_j,x_k) }(X_{ij}-u_j)(X_{ik}-x_k) K^{X_i}_h(X_i-x)\mathrm dx_{-(jk)} \end{aligned}$$

with \(\mathcal X_{-(jk)}(x_j,x_k) =\{ u \in \mathcal X : u_k=x_k, u_j=x_j\}\) and \(\mathcal X_{-k,j}(x_k)=\{ u\in \mathcal X _j:\) there exists \(v \in \mathcal X\) with \(v_k=x_k\) and \(v_j = u\}\) and \(u_{-(jk)} \) denoting the vector \((u_l: l \in \{1,\dots ,d\}\backslash \{j,k\} )\).

Appendix 2: Proofs of Propositions 1 and 2

In this section we will give proofs for Propositions 1 and 2. They were used in Sect. 3 for the discussion of the existence of the smooth backfitting estimator as well as the convergence of an algorithm for its calculation.

Proof

(of Proposition 1 )

\(\mathbf {{(ii) \Rightarrow (i)}}.\) Let \(g^{(n)}\in L\) be a Cauchy sequence. We must show \(\lim _{n \rightarrow \infty } g^{(n)} \in L\). By definition of L there exist sequences \(g_1^{(n)}\in L_1\) and \(g_2^{(n)}\in L_2\) such that \(g^{(n)}=g_1^{(n)}+g_2^{(n)}\). With (8), for \(i=1,2\) we obtain

$$ \left\Vert g_i^{(n)}-g_i^{(m)}\right\Vert \le \frac{1}{c}\left\Vert g^{(n)}-g^{(m)}\right\Vert \rightarrow 0. $$

Hence, \(g_1^{(n)}\) and \(g_2^{(n)}\) are Cauchy sequences. Since \(L_1\) and \(L_2\) are closed their limits are elements of \(L_1\subseteq L\) and \(L_2\subseteq L\), respectively. Thus,

$$ \lim _{n \rightarrow \infty } g^{(n)}= \lim _{n \rightarrow \infty } g_1^{(n)}+g_2^{(n)} \in L. $$

\(\mathbf {{(i) \Rightarrow (iii)}}.\) We write \(\Pi _1(L_2)=\Pi _1\). Since L is closed, it is a Banach space. Using the closed graph theorem, it suffices to show the following: If \(g^{(n)}\in L\) and \(\Pi _1 g^{(n)}\in L_1\) are converging sequences with limits \(g, g_1\), then \(\Pi _1g=g_1\).

Let \(g^{(n)}\in L\) and \(\Pi _1 g^{(n)}\in L_1\) be sequences with limits g and \(g_1\), respectively. Write \(g^{(n)}=g_1^{(n)}+g_2^{(n)}\). Since

$$ \left\Vert g_2^{(n)}-g_2^{(m)}\right\Vert \le \left\Vert g_1^{(n)}-g_1^{(m)}\right\Vert +\left\Vert g^{(n)}-g^{(m)}\right\Vert $$

\(g_2^{(n)}\) is a Cauchy sequence converging to a limit \(g_2\in L_2\). We conclude \(g=g_1+g_2\), meaning \(\Pi _1g=g_1\).

\(\mathbf {{(iii) \Rightarrow (ii)}}.\) If \(\Pi _1\) is a bounded operator, then so is \(\Pi _2\), since \(\left\Vert g_2\right\Vert \le \left\Vert g\right\Vert +\left\Vert g_1\right\Vert \). Denote the corresponding operator norms by \(C_1\) and \(C_2\), respectively. Then

$$\max \{\left\Vert g_1\right\Vert ,\left\Vert g_2\right\Vert \}\le \max \{C_1,C_2\}\left\Vert g\right\Vert $$

which concludes the proof by choosing \(c=\frac{1}{\max \{C_1,C_2\}}\).

\(\mathbf {{(iii) \Leftrightarrow (iv)}}.\) This follows from

$$ \left\Vert \Pi _1\right\Vert = \sup _{g\in L} \frac{\left\Vert g_1\right\Vert }{\left\Vert g\right\Vert }=\sup _{g_1\in L_1, g_2 \in L_2} \frac{\left\Vert g_1\right\Vert }{\left\Vert g_1+g_2\right\Vert }= \sup _{g_1\in L_1} \frac{\left\Vert g_1\right\Vert }{\textrm{dist}(g_1,L_2)}=\frac{1}{\gamma (L_1,L_2)}. $$

Lemma 9

Let \(L_1,L_2\) be closed subspaces of a Hilbert space. For \(\gamma \) defined as in Proposition 1 we have

$$\begin{aligned} \gamma (L_1,L_2)^2 = 1-\left\Vert \mathcal {P}_2\mathcal {P}_1\right\Vert ^2. \end{aligned}$$

Proof

$$\begin{aligned} \gamma (L_1,L_2)^2&=\inf _{g_1\in L_1,\left\Vert g_1\right\Vert =1}\left\Vert g_1-\mathcal {P}_2g_1\right\Vert \\&=\inf _{g_1\in L_1,\left\Vert g_1\right\Vert =1}\langle g_1-\mathcal {P}_2g_1,g_1-\mathcal {P}_2g_1\rangle \\&=\inf _{g_1\in L_1,\left\Vert g_1\right\Vert =1}\langle g_1,g_1\rangle -\langle \mathcal {P}_2g_1,\mathcal {P}_2g_1\rangle \\&=1-\sup _{g_1\in L_1,\left\Vert g_1\right\Vert =1}\langle \mathcal {P}_2g_1,\mathcal {P}_2g_1\rangle \\&=1-\sup _{g\in L,\left\Vert g\right\Vert =1}\langle \mathcal {P}_2\mathcal {P}_1g,\mathcal {P}_2\mathcal {P}_1g\rangle \\&=1-\left\Vert \mathcal {P}_2\mathcal {P}_1\right\Vert ^2. \end{aligned}$$

Proof (of Proposition 2)

Let \(\mathcal P_j\) be the orthogonal projection onto \(L_j\). Following Lemma 9 we have

$$\begin{aligned} 1-\left\Vert {\mathcal P_2 \mathcal P_1}\right\Vert ^2=\gamma (L_1,L_2)^2. \end{aligned}$$

Using Proposition 1, proving \(\left\Vert \mathcal P_2 \mathcal P_1\right\Vert <1\) implies that L is closed. Observe that \(\left\Vert \mathcal P_2 \mathcal P_1\right\Vert \le 1\) because for \(g \in L\)

$$ \left\Vert \mathcal P_j g\right\Vert ^2= \langle \mathcal P_j g,\mathcal P_j g\rangle =\langle g,\mathcal P_j g\rangle \le \left\Vert g\right\Vert \left\Vert \mathcal P_j g\right\Vert , $$

which yields \(\left\Vert \mathcal P_i\right\Vert \le 1\) for \(i=1,2\). To show the strict inequality, note that if \(\mathcal P_2{_{\big | L_1}}\) is compact, so is \(\left\Vert {\mathcal P_2 \mathcal P_1}\right\Vert \) since the composition of two operators is compact if at least one is compact.

Thus, for every \(\varepsilon >0\), \(\mathcal P_2 \mathcal P_1\) has at most a finite number of eigenvalues greater than \(\varepsilon \). Since 1 is clearly not an eigenvalue, we conclude \(\left\Vert {\mathcal P_1 \mathcal P_2}\right\Vert <1\).

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hiabu, M., Mammen, E., Meyer, J.T. (2023). Local Linear Smoothing in Additive Models as Data Projection. In: Belomestny, D., Butucea, C., Mammen, E., Moulines, E., Reiß, M., Ulyanov, V.V. (eds) Foundations of Modern Statistics. FMS 2019. Springer Proceedings in Mathematics & Statistics, vol 425. Springer, Cham. https://doi.org/10.1007/978-3-031-30114-8_5

Download citation

Publish with us

Policies and ethics