A sparse estimate based on variational approximations for semiparametric generalized additive models

Yang, Fan; Yang, Yuehan

doi:10.1007/s00180-024-01485-2

A sparse estimate based on variational approximations for semiparametric generalized additive models

Original Paper
Published: 25 March 2024

(2024)
Cite this article

Computational Statistics Aims and scope Submit manuscript

54 Accesses
Explore all metrics

Abstract

In semiparametric regression, traditional methods such as mixed generalized additive models (GAM), computed via Laplace approximation or variational approximation using penalized marginal likelihood estimation, may not achieve sparsity and unbiasedness simultaneously, and may sometimes suffer from convergence problems. To address these issues, we propose an estimator for semiparametric generalized additive models based on the marginal likelihood. Our approach provides sparsity estimates and allows for statistical inference. To estimate and select variables, we use the smoothly clipped absolute deviation penalty (SCAD) within the framework of variational approximation. We also propose efficient iterative algorithms to obtain estimations. Simulation results support our theoretical characteristics, and we demonstrate that our method is more effective than the original variational approximations framework and many other penalized methods under certain conditions. Moreover, applications with actual data further demonstrate the superior performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Additive model selection

Article 12 March 2016

Efficient Shrinkage for Generalized Linear Mixed Models Under Linear Restrictions

Article 25 January 2018

Nonparametric inference for additive models estimated via simplified smooth backfitting

Article 15 July 2022

References

Antoniadis A, Fan J (2001) Regularization of wavelet approximations. J Am Stat Assoc 96(455):939–967
Article MathSciNet Google Scholar
Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88(421):9–25
Google Scholar
Brezger A, Lang S (2006) Generalized structured additive regression based on Bayesian p-splines. Comput Stat Data Anal 50(4):967–991
Article MathSciNet Google Scholar
Chartrand R (2007) Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process Lett 14(10):707–710
Article Google Scholar
Corbeil RR, Searle SR (1976) Restricted maximum likelihood (reml) estimation of variance components in the mixed model. Technometrics 18(1):31–38
Article MathSciNet Google Scholar
Eilers PHC, Marx BD (1996) Flexible smoothing with B-splines and penalties. Stat Sci 11(2):89–121
Article MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Fan J, Peng H (2004) Nonconcave penalized likelihood with a diverging number of parameters. Ann Stat 32(3):928–961
Article MathSciNet Google Scholar
Fan J, Xue L, Zou H (2014) Strong oracle optimality of folded concave penalized estimation. Ann Stat 42(3):819
Article MathSciNet Google Scholar
Friedman JH, Stuetzle W (1981) Projection pursuit regression. J Am Stat Assoc 76(376):817–823
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1
Article Google Scholar
Hastie T, Tibshirani R (1987) Generalized additive models: some applications. J Am Stat Assoc 82(398):371–386
Article Google Scholar
Hui FK, You C, Shang HL, Müller S (2019) Semiparametric regression using variational approximations. J Am Stat Assoc
Kauermann G, Krivobokova T, Fahrmeir L (2009) Some asymptotic results on generalized penalized spline smoothing. J R Stat Soc Ser B 71(2):487–503
Article MathSciNet Google Scholar
Loh P-L, Wainwright MJ (2015) Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J Mach Learn Res 16(1):559–616
MathSciNet Google Scholar
Luts J, Wand MP (2015) Variational inference for count response semiparametric regression. Bayesian Anal 10(4):991–1023
Article MathSciNet Google Scholar
Luts J, Broderick T, Wand MP (2014) Real-time semiparametric regression. J Comput Graph Stat 23(3):589–615
Article MathSciNet Google Scholar
Lv Y-W, Yang G-H (2022) An adaptive cubature Kalman filter for nonlinear systems against randomly occurring injection attacks. Appl Math Comput 418:126834
Google Scholar
McElreath R (2018) Statistical rethinking: a Bayesian course with examples in R and Stan. Chapman and Hall/CRC
Nelder JA, Wedderburn RW (1972) Generalized linear models. J R Stat Soc Ser A 135(3):370–384
Article Google Scholar
Shang HL, Hui FK (2019) Package ‘vagam’
Wang Z, Liu H, Zhang T (2014) Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann Stat 42(6):2164
Article MathSciNet Google Scholar
Wood S (2018) Mixed gam computation vehicle with automatic smoothness estimation. R Package Version 1.8–12
Wood SN (2006) Generalized additive models: an introduction with R. Chapman and hall/CRC
Wood S, Scheipl F (2017) gamm4: Generalized additive mixed models using ‘mgcv’and ‘lme4’. r package version 0.2-5
Xu Z, Chang X, Xu F, Zhang H (2012) $ l$_ ${1/2}$ regularization: a thresholding representation theory and a fast solver. IEEE Trans Neural Netw Learn Syst 23(7):1013–1027
Article Google Scholar
Zhang C-H (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Article MathSciNet Google Scholar
Zhou B, Gao J, Tran M-N, Gerlach R (2021) Manifold optimization-assisted Gaussian variational approximation. J Comput Graph Stat 30(4):946–957
Article MathSciNet Google Scholar
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509
MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 12371281); the Emerging Interdisciplinary Project, Program for Innovation Research, and the Disciplinary Funds of Central University of Finance and Economics.

Author information

Authors and Affiliations

School of Statistics and Mathematics, Central University of Finance and Economics, Beijing, China
Fan Yang & Yuehan Yang

Authors

Fan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yuehan Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuehan Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Theorem 1

Set $\delta =d^{1/2}n^{-1/2}$, where $d = o(n^{1/4})$. The goal is to prove $\Vert {\hat{\theta }}-\theta ^0\Vert _2=O_p(\delta )$. Let $\theta = \theta ^0+\delta u$, then it is equivalent to prove that there exists a large constant C such that $\forall \varepsilon >0$,

$$\begin{aligned} P\{\sup \limits _{\Vert u\Vert _2=C}{l_{\lambda }\left( \theta ,\text {vech}(A)\right) }\}>l_{\lambda } \theta ^0,\text {vech}(A)\} \ge 1-\varepsilon , \end{aligned}$$

which implies that with probability at least $1-\varepsilon $ there is a local solution in the ball $\{\theta : \Vert \theta -\theta ^0\Vert _2=\delta u,\Vert u\Vert _2 \le C\} $ for ${\arg \max }_{\theta }\ l_{\lambda }\left( \theta ,\text {vech}(A)\right) $. According to Lemma 1, when ${\hat{\theta }} \in \{\theta : \Vert \theta -\theta ^0\Vert _2=\delta u,\Vert u\Vert _2 \le C\} $, the covariance ${\hat{A}}$ estimated by VA approximation satisfies ${\hat{A}} = O_p(1/n)$. For any $\theta \in \{\theta : \Vert \theta -\theta ^0\Vert _2=\delta u,\Vert u\Vert _2 \le C\}$, by Taylor expansion, we have

$$\begin{aligned} \beta _n(\theta )&\equiv l_{\lambda }\left( \theta ,\text {vech}(A)\right) -l_{\lambda }\left( \theta ^0,\text {vech}(A)\right) \\&=\left( \theta -\theta ^{0}\right) ^\mathrm {\scriptscriptstyle T} \nabla _{\theta } \underline{\ell }\left\{ \theta ^{0}, {\text {vech}}(A)\right\} -\frac{1}{2}\left( \theta -\theta ^{0}\right) ^\mathrm {\scriptscriptstyle T} \left[ -\nabla _{\theta }^{2} \underline{\ell }\left\{ \theta ^{0}, {\text {vech}}(A)\right\} \right] \left( \theta -\theta ^{0}\right) \\&~~~+\frac{1}{6} \sum _{r, s, t=1}^{{\text {dim}}(\theta )} \frac{\partial ^{3} \underline{\ell }\{\overline{\theta }, {\text {vech}}(A)\}}{\partial \theta _{r} \partial \theta _{s} \partial \theta _{t}}\left( \theta -\theta ^{0}\right) _{r}\left( \theta -\theta ^{0}\right) _{s}\left( \theta -\theta ^{0}\right) _{t}\\&~~~+n\sum _{l=1}^{p}\Big \{p_{\lambda _0}\big (\sqrt{\beta _l^{\mathrm {\scriptscriptstyle T} }H_l\beta _l}\big ) -p_{\lambda _0}\big (\sqrt{\beta _{0l}^{\mathrm {\scriptscriptstyle T} }H_l\beta _{0l}}\big )\Big \} \\&\triangleq T_{1}+T_{2}+T_{3}+T_{\lambda }. \end{aligned}$$

We already have $T_1=C\cdot O_p(d)$, $T_2\le -K^2\tau _{\min }d$, $T_3=o_p(d)$. In addition, by the inequality $p_\lambda (|x|)-p_\lambda (|y|)\le \lambda |x-y|$, the conditions $\lambda _0=o(1)$ and $\lambda _0=O\left( d^{1/2}n^{-1/2}\right) $, we have

$$\begin{aligned} T_\lambda&=n\sum _{l=1}^{p}\Big \{p_{\lambda _0}\Big (\sqrt{\beta _l^{\mathrm {\scriptscriptstyle T} }H_l\beta _l} \Big ) -p_{\lambda _0}\Big (\sqrt{\beta _{0l}^{\mathrm {\scriptscriptstyle T} }H_l\beta _{0l}}\Big )\Big \} \\&\le n\lambda _0 \sum _{l=1}^{p} \Big | \sqrt{\beta _l^{\mathrm {\scriptscriptstyle T} }H_l\beta _l} -\sqrt{\beta _{0l}^{\mathrm {\scriptscriptstyle T} }H_l\beta _{0l}} \Big | \\&= n\lambda _0\sum _{l=1}^{p}\frac{\Big | \beta _l^{\mathrm {\scriptscriptstyle T} }H_l\beta _l -\beta _0l^{\mathrm {\scriptscriptstyle T} }H_l\beta _0l \Big |}{\Big | \sqrt{\beta _l^{\mathrm {\scriptscriptstyle T} }H_l\beta _l} +\sqrt{\beta _{0l}^{\mathrm {\scriptscriptstyle T} }H_l\beta _{0l}} \Big |}. \end{aligned}$$

By Condition C6, we can obtain that there exists a constant $C_3$ such that $\big | \sqrt{\beta _l^{\mathrm {\scriptscriptstyle T} }H_l\beta _l}+\sqrt{\beta _{0\,l}^{\mathrm {\scriptscriptstyle T} }H_l\beta _{0\,l}} \big |>C_3$, then

$$\begin{aligned} T_\lambda&\le nC\lambda _0\sum _{l=1}^{p}\frac{\Vert \beta _l-\beta _{0l}\Vert _2}{\sqrt{C_3}} =O_p\big (n\lambda _0d^{1/2}n^{-1/2})\\&= O_p(\lambda _0d^{1/2}n^{1/2})=O_p(d). \end{aligned}$$

In summary, $T_1=C \cdot O_p(d)$, $T_2\le -K^2\tau _{\min }d$, $T_3=o_p(d)$, and $T_\lambda =O_p(d)$. If we choose a large constant K, $T_2$ will dominate $T_1$, $T_2$ and $T_\lambda $. We thus conclude that there exists a local solve in the $\{\theta : \Vert \theta -\theta ^0\Vert _2=\delta u,\Vert u\Vert _2 \le C \} $ for ${\arg \max }_{\theta } l_{\lambda }\left( \theta ,\text {vech}(A)\right) $. Then $\Vert \theta -\theta ^0\Vert _2=O_p(\delta )=O_p(d^{1/2}n^{-1/2})$ is able to proved, completing the proof. $\square $

Proof of Theorem 2

Based on Theorem 1, we can obtain that ${\hat{\theta }}$ is the consistent estimator of $\theta $, thus there exists a local maximizer ${\hat{\theta }}$ with the convergence rate $d^{1/2}n^{-1/2}$ such that ${l_{\lambda }\left( \theta ,\text {vech}(A)\right) }$ is maximum. That is

$$\begin{aligned} \left. \frac{\partial l_{\lambda }\left( \theta ,\text {vech}(A)\right) }{\partial \theta _{j}}\right| _{\theta ^\mathrm {\scriptscriptstyle T} = \big ({\hat{\alpha }}^\mathrm {\scriptscriptstyle T} , {\hat{\beta }}^\mathrm {\scriptscriptstyle T} \big )} =0 \quad \text {for} j=1, \dots , p+q. \end{aligned}$$

Note that ${\hat{\theta }}$ converges to $\theta $ in probability and by Taylor expansion, we have when $j=1,\dots ,p$, the score function of the likelihood $\partial l_{\lambda }\left( \theta ,\text {vech}(A)\right) /\partial \theta _{j}=\partial l_{}\left( \theta ,\text {vech}(A)\right) /\partial \theta _{j}$ because the penalty item does not contain the parameter component $\psi =(\alpha ,\phi )$, and when $j=p+1,\dots ,q$, the following holds

$$\begin{aligned} \partial l_{\lambda }\left( \theta ,\text {vech}(A)\right) /\partial \theta _{j}=\dfrac{\partial l_{\lambda }\left( \theta ,\text {vech}(A)\right) }{\partial \theta _{j}} \Big |_{\theta =\left( {\hat{\alpha }},{\hat{\beta }} \right) }-n p_{\lambda _{}}^{\prime }\left( \sqrt{{\hat{\beta }}_j^\mathrm {\scriptscriptstyle T} H_j{\hat{\beta }}_j}\right) H_j{\hat{\beta }}_j/(\sqrt{{\hat{\beta }}_j^\mathrm {\scriptscriptstyle T} H_j{\hat{\beta }}_j}). \end{aligned}$$

Therefore, we only need to prove that the linear part of VA estimator is asymptotically normal solution the linear part $\alpha $ to $\partial l_{}\left( \theta ,\text {vech}(A)\right) /\partial \theta _{j}=0$ is one and the same as $\partial l_{\lambda }\left( \theta ,\text {vech}(A)\right) /\partial \theta _{j}$. According to Hui et al. (2019), the VA estimators of normal response, poisson response and bernoulli response satisfy:

$$\begin{aligned} n^{1 / 2} G\left( \hat{\theta }-\theta ^{0}\right) =\frac{1}{n^{1 / 2}} G \mathcalligra{I}^{-1}\left( \theta ^{0}\right) \nabla _{\theta } \ell \left( \theta ^{0}\right) +G \mathcalligra {I}^{-1}\left( \theta ^{0}\right) \delta , \end{aligned}$$

and the last term on the right-hand side is asymptotically negligible. We also have $\sum _{i=1}^{n} L_{i}=n^{-1 / 2} G \mathcalligra {I}^{-1}\left( \theta ^{0}\right) \nabla _{\theta } \ell \left( \theta ^{0}\right) $ such that $\sum _{i=1}^{n} L_{i}=n^{-1 / 2} G \mathcalligra {I}^{-1}\left( \theta ^{0}\right) \nabla _{\theta } \ell \left( \theta ^{0}\right) $. Thus, we conclude $E(l_i)=0$ and ${\text {Var}}\left( L_{i}\right) =n^{-1} G \mathcalligra {I}^{-1}\left( \theta ^{0}\right) G^\mathrm {\scriptscriptstyle T} $. By multivariate Lindeberg-Feller central limit theorem, we have $\jmath ^{-1} \ell (\theta ^0)\nabla _0 (\theta ^0){\mathop {\longrightarrow }\limits ^{d}}N(0,G \jmath ^{-1} ( \theta ^0)G^\top )$, completing the proof. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, F., Yang, Y. A sparse estimate based on variational approximations for semiparametric generalized additive models. Comput Stat (2024). https://doi.org/10.1007/s00180-024-01485-2

Download citation

Received: 22 June 2023
Accepted: 28 February 2024
Published: 25 March 2024
DOI: https://doi.org/10.1007/s00180-024-01485-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sparse estimate based on variational approximations for semiparametric generalized additive models

Abstract

Access this article

Similar content being viewed by others

Additive model selection

Efficient Shrinkage for Generalized Linear Mixed Models Under Linear Restrictions

Nonparametric inference for additive models estimated via simplified smooth backfitting

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A sparse estimate based on variational approximations for semiparametric generalized additive models

Abstract

Access this article

Similar content being viewed by others

Additive model selection

Efficient Shrinkage for Generalized Linear Mixed Models Under Linear Restrictions

Nonparametric inference for additive models estimated via simplified smooth backfitting

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation