Partial linear modelling with multi-functional covariates

Aneiros, Germán; Vieu, Philippe

doi:10.1007/s00180-015-0568-8

Partial linear modelling with multi-functional covariates

Original Paper
Published: 05 March 2015

Volume 30, pages 647–671, (2015)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Germán Aneiros¹ &
Philippe Vieu²

905 Accesses
53 Citations
Explore all metrics

Abstract

This paper takes part on the current literature on semi-parametric regression modelling for statistical samples composed of multi-functional data. A new kind of partially linear model (so-called MFPLR model) is proposed. It allows for more than one functional covariate, for incorporating as well continuous and discrete effects of functional variables and for modelling these effects as well in a nonparametric as in a linear way. Based on the continuous specificity of functional data, a new method is proposed for variable selection (so-called PVS method). In addition, from this procedure, new estimates of the various parameters involved in the partial linear model are constructed. A simulation study illustrates the finite sample size behavior of the PVS procedure for selecting the influential variables. Through some real data analysis, it is shown how the method is reaching the three main objectives of any semi-parametric procedure. Firstly, the flexibility of the nonparametric component of the model allows to get nice predictive behavior; secondly, the linear component of the model allows to get interpretable outputs; thirdly, the low computational cost insures an easy applicability. Even if the intent is to be used in multi-functional problems, it will briefly discuss how it can also be used in uni-functional problems as a boosting tool for improving prediction power. Finally, note that the main feature of this paper is of applied nature but some basic asymptotics are also stated in a final “Appendix”.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable Selection in Semiparametric Bi-functional Models

Bayesian latent factor regression for multivariate functional data with variable selection

Article 02 January 2020

A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates

Article 21 May 2018

References

Aneiros G, Cao R, Vilar-Fernández JM, Muñoz-San-Roque A (2013) Functional prediction for the residual demand in electricity spot markets. IEEE Trans Power Syst 28(4):4201–4208
Article Google Scholar
Aneiros G, Ferraty F, Vieu P (2014) Variable selection in partial linear regression with functional covariate. Statistics. doi:10.1080/02331888.2014.998675
Aneiros-Pérez G, Vieu P (2006) Semi-functional partial linear regression. Stat Probab Lett 76(11):1102–1110
Article MATH Google Scholar
Aneiros-Pérez G, Vieu P (2011) Automatic estimation procedure in partial linear model with functional data. Stat Pap 52(4):751–771
Article MATH Google Scholar
Aneiros-Pérez G, Vieu P (2013) Testing linearity in semi-parametric functional data analysis. Comput Stat 28(2):413–434
Article MATH Google Scholar
Aneiros G, Vieu P (2014) Variable selection in infinite-dimensional problems. Stat Probab Lett 94:12–20
Article MATH MathSciNet Google Scholar
Chen D, Hall P, Müller HG (2011) Single and multiple index functional regression models with nonparametric link. Ann Stat 39(3):1720–1747
Article MATH Google Scholar
Cuevas A (2014) A partial overview pof the theory of statistics with functional data. J Stat Plann Inference 147:1–23
Article MATH MathSciNet Google Scholar
Du J, Zhang Z, Sun Z (2013) Variable selection for partially linear varying coefficient quantile regression model. Int J Biomath 6(3):14
MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article MATH MathSciNet Google Scholar
Fan J, Peng H (2004) Nonconcave penalized likelihood with a diverging number of parameters. Ann Stat 32:928–961
Article MATH MathSciNet Google Scholar
Ferraty F, Goia A, Salinelli E, Vieu P (2013) Functional projection pursuit regression. Test 22:293–320
Article MATH MathSciNet Google Scholar
Ferraty F, Hall P, Vieu P (2010) Most-predictive design points for functional data predictors. Biometrika 97:807–824
Article MATH MathSciNet Google Scholar
Ferraty F, Laksaci A, Tadj A, Vieu P (2010) Rate of uniform consistency for nonparametric estimates with functional variables. J Stat Plann Inference 140:335–352
Article MATH MathSciNet Google Scholar
Ferraty F, Park J, Vieu P (2011) Estimation of a functional single index model. In: Recent advances in functional data analysis and related topics, Contrib Stat Springer, Heidelberg, pp 111–116
Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York
Google Scholar
Gertheiss J, Maity A, Staicu AM (2013) Variable selection in generalized functional linear models. Stat 2:86–101
Article Google Scholar
Goia A, Vieu P (2014) Some advances on semi-parametric functional data modelling. In: Contributions in infinite-dimensional statistics and related topics, Esculapio, Bologna
Goia A, Vieu P (2014) A partitioned single functional index model. Comput Stat. doi:10.1007/s00180-014-0530-1
Goldsmith J, Bobb J, Crainiceanu C, Caffo B, Reich D (2011) Penalized functional regression. J Comput Graph Stat 20:830851
Article MathSciNet Google Scholar
Guo J, Tang M, Tian M, Zhu K (2013) Variable selection in high-dimensional partially linear additive models for composite quantile regression. Comput Stat Data Anal 65:56–67
Article MathSciNet Google Scholar
Härdle W, Liang H, Gao J (2000) Partially linear models. Physica-Verlag, Heidelberg
Book MATH Google Scholar
Härdle W, Liang H (2007) Statistical methods for biostatistics and related fields. Springer, Berlin, pp 87–103
Book MATH Google Scholar
Hong Z, Hu Y, Lian H (2013) Variable selection for high-dimensional varying coefficient partially linear models via nonconcave penalty. Metrika 76(7):887–908
Article MATH MathSciNet Google Scholar
Hu Y, Lian H (2013) Variable selection in a partially linear proportional hazards model with a diverging dimensionality. Stat Probab Lett 83(1):61–69
Article MATH MathSciNet Google Scholar
Huang J, Xie H (2007) Asymptotic oracle properties of SCAD-penalized least squared estimators. Asymptotics: particles, processes and inverse problems. In: IMS Lecture Notes-Monograph Series. 55, pp 149–166
Hunter DR, Li RA (2005) Variable selection using MM algorithms. Ann Stat 33(4):1617–1642
Article MATH MathSciNet Google Scholar
Kneip A, Poss D, Sarda P. Functional linear regression with points of impact. (Preprint)
Lian H (2011) Functional partial linear model. J Nonparametr Stat 23(1):115–128
Article MATH MathSciNet Google Scholar
Liang H, Härdle W, Carroll RJ (1999) Estimation in a semiparametric partially linear errors-in-variables model. Ann Stat 27(5):1519–1535
Article MATH Google Scholar
Maity A, Huang JZ (2012) Partially linear varying coefficient models stratified by a functional covariate. Stat Probab Lett 82(10):1807–1814
Article MATH MathSciNet Google Scholar
McKeague IW, Sen B (2010) Fractals with point impact in functional linear regression. Ann Stat 38:2559–2586
Article MATH MathSciNet Google Scholar
Ni X, Zhang HH, Zhang D (2009) Automatic model selection for partially linear models. J Multivar Anal 100:2100–2111
Article MATH MathSciNet Google Scholar
Pateiro-López B, González-Manteiga W (2006) Multivariate partially linear models. Stat Probab Lett 76:1543–1549
Article MATH Google Scholar
Rachdi M, Vieu P (2007) Nonparametric regression for functional data: automatic smoothing parameter selection. J Stat Plann Inference 137(9):2784–2801
Article MATH MathSciNet Google Scholar
Robinson PM (1988) Root-n-consistent semiparametric regression. Econometrica 56(4):931–954
Article MATH MathSciNet Google Scholar
Speckman P (1988) Kernel smoothing in partial linear models. J R Stat Soc Ser B 50(3):413–436
MATH MathSciNet Google Scholar
Wang H, Zou G, Wan A (2013) Adaptive LASSO for varying-coefficient partially linear measurement error models. J Stat Plann Inference 143(1):40–54
Article MATH MathSciNet Google Scholar
Xia Y, Härdle W (2006) Semi-parametric estimation of partially linear single-index models. J Multivar Anal 97(5):1162–1184
Article MATH Google Scholar
Xie H, Huang J (2009) SCAD-penalized regression in high-dimensional partially linear models. Ann Stat 37(2):673–696
Article MATH MathSciNet Google Scholar
Zhang J, Wang T, Zhu L, Liang H (2012) A dimension reduction based approach for estimating and variable selection in partially linear single-index models with high-dimensional covariates. Electron J Stat 6:2235–2273
Article MATH MathSciNet Google Scholar
Zhang R, Zhao W, Liu J (2013) Robust estimation and variable selection for semiparametric partially linear varying coefficient model based on modal regression. J Nonparametr Stat 25(2):523–544
Article MATH MathSciNet Google Scholar
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

The authors wish to express their great gratitude to the Editors and the Reviewers who have provided very interesting comments. Their suggestions were of great help when revising this work and will certainly increase its impact. In particular the reviewing procedure has been the opportunity to have highly interesting cross exchanges with one Referee about the interest and the meaning of our model (see Remark 2), which have greatly contributed to improve this work, at least by pointing our some challenging issues for the future. The research of G. Aneiros was partially supported by Grants MTM2011-22392 and CN2012/130 from Spanish Ministerio de Economía y Competitividad and Xunta Galicia, respectively.

Author information

Authors and Affiliations

Departamento de Matemáticas, Universidad de A Coruña, A Coruña, Spain
Germán Aneiros
Institut de Mathématiques, Université Paul Sabatier, Toulouse, France
Philippe Vieu

Authors

Germán Aneiros
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Vieu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philippe Vieu.

Appendix

(A)
Conditions on the model. In addition to both the general conditions (1), (2), (3), (4) and (7) and the specific ones (5) and (6), minor additional technical assumptions on the model include regularity on the observed grid of the curve $\chi $:
$$\begin{aligned} \exists c_1,c_2, \forall j=1,\ldots ,p_n -1, 0<c_1p_n^{-1}<t_{j+1}-t_j<c_2p_n^{-1}< \infty , \end{aligned}$$
(18)
as well as smoothness and boundedness conditions on $\chi $:
$$\begin{aligned} \chi { \text{ is } \text{ Lipschitz } \text{ continuous } \text{ on } \text{ its } \text{ support }} \end{aligned}$$
(19)
and
$$\begin{aligned} \exists \eta , \ \forall t\in [C,D], \ |\chi (t)|\ge \eta >0. \end{aligned}$$
(20)
(B)
Conditions on the linear estimate and the variable selection. Let us consider the partial linear regression model
$$\begin{aligned} Y=\sum _{j \in {\mathcal {P}}_n} \gamma ^jX^j + m(\zeta ) + \epsilon , \end{aligned}$$
(21)
where ${\mathcal {P}}_n \subset \{1,\ldots ,p_n\}$ with $\#{\mathcal {P}}_n=O(\omega _n)$ or $\#{\mathcal {P}}_n=O(s_n)$. Let us assume that the standard SCAD-penalized least squares procedure leads to estimates $\tilde{\gamma }^j$ of $\gamma ^j$ satisfying the following properties:
$$\begin{aligned} P\Bigl (\{{j \in {\mathcal {P}}_n},\gamma ^j=0\}=\{{j \in {\mathcal {P}}_n},\tilde{\gamma }^j=0\} \Bigr ) \rightarrow 1 { \text{ as } } n \rightarrow \infty \end{aligned}$$
(22)
and
$$\begin{aligned} \exists \theta \ge 0,\quad ||\tilde{\gamma } - \gamma || = O_p (n^{-1/2}(\#{\mathcal {P}}_n)^\theta ), \end{aligned}$$
(23)
where we have denoted $\gamma =(\gamma ^j,j\in {\mathcal {P}}_n)^{\prime }$.

Remark 5

As noted in Remark 3, suitable conditions under which (22) and (23) hold can be seen in Aneiros et al. (2014).

(C)
Conditions on the nonparametric estimate. Let us consider the nonparametric models
$$\begin{aligned} Y^{*}=m(\zeta ) + \epsilon \end{aligned}$$
and
$$\begin{aligned} X^{j}=g_j(\zeta ) + \eta _j,\quad \ j=1,\ldots ,p_n, \end{aligned}$$
and let $\widehat{m}^{*}(z)$ and $\widehat{g}_{j}(z)$ denote the corresponding nonparametric estimates for $m(z)$ an $g_j(z)$, respectively, obtained from the models above by using the same kind of weights as those used in the PVS procedure. Let us assume that
$$\begin{aligned}&\sup _{z\in \mathcal {C}}\left| \widehat{m}^{*}(z)-m(z)\right| =O_{p}\left( a_n \right) , \end{aligned}$$
(24)

$$\begin{aligned}&\sup _{z\in \mathcal {C}, j\in {\mathcal {S}}_{n}}|\widehat{g} _{j}(z)-g_{j}(z)|=O_{p}\left( b_n \right) \end{aligned}$$
(25)
and
$$\begin{aligned} \sup _{z\in \mathcal {C},j\in {\mathcal {S}}_{n}}|g_{j}(z)|=O\left( 1\right) . \end{aligned}$$
(26)

Remark 6

Specific rates of convergence for nonparametric estimators with functional covariate can be seen in Ferraty et al. (2010).

(D)
Some asymptotics.

Theorem 1

Under conditions (1)–(7) and (18)–(23), and if $\omega _n\rightarrow \infty $ as $n \rightarrow \infty $, one has

$$\begin{aligned} ||\hat{\alpha } - \alpha || = O_p (n^{-1/2}s_n^{\theta }) \end{aligned}$$

(27)

and

$$\begin{aligned} P(\hat{\mathcal {S}}_n={\mathcal {S}}_n) \rightarrow 1 { \text{ as } } n \rightarrow \infty . \end{aligned}$$

(28)

If in addition conditions (24), (25) and (26) hold, and $h\rightarrow 0$ and $b_n \rightarrow 0$ as $n \rightarrow \infty $, then

$$\begin{aligned} \sup _ {z \in \mathcal {C} } |\hat{m}(z)-m(z)| =O_{p}\left( a_n \right) +O_p(n^{-1/2}s_n^{1/2+\theta }). \end{aligned}$$

(29)

(E)
Some remarks. It is illustrative to compare the rates obtained in Theorem 1 with those corresponding to some standard procedure. To the best of our knowledge, at this moment Aneiros et al. (2014) is the only paper devoted to variable selection in the context of partial linear models with functional covariate in the nonparametric component. Thus, we are required to compare the PVS procedure with the proposal in Aneiros et al. (2014). For that, the first thing to do is to be sure that (22) and (23) hold in the specific setting where the covariates $X^j$ come from a curve. If, in such setting, one restricts to fixed $s_n=s<\infty $, the conditions imposed in Aneiros et al. (2014) are trivially fulfill and (22) and (23) (considering $\theta =0$) hold. Thus, the PVS and the standard SCAD-penalized least squares estimators of the parametric component $\alpha $ converge at the same rate ($O_p(n^{-1/2})$). In addition, the corresponding nonparametric estimators of the functional $m$ achieve the common uniform rate
$$\begin{aligned} O_{p}\left( h^{\delta }+\sqrt{1/(n\Phi _{n})}\right) \end{aligned}$$
where we have denoted $\Phi _{n}=\phi (h)/\psi _{\mathcal {C}}\left( 1/n\right) $ with $\phi (\cdot )$ and $\psi _{\mathcal {C}}(\cdot )$ being the small ball probability function and the Kolmogorov entropy associated to $\mathcal {C}$, respectively, corresponding to the functional variable $\zeta $, and $\delta $ denotes the order of a Hölder condition imposed on the various smooth functions associated to the partial linear regression model. Finally, it is worth to be noted that, to obtain better rates for the PVS procedure than for the standard one, more research is needed. Basically, it is sufficient to satisfy (22) and (23) for some $\theta >0$, assuming covariates coming from a curve.
(F)
Sketch form of the proofs. Proofs of both (27) and (28) can be obtained by using the same techniques as those in Aneiros and Vieu (2014) while, by means of (27) and (28), it is easy to get (29). Thus, in order to save space, only sketches of the corresponding proofs are presented here.

Proof of (27)

Considering

$$\begin{aligned}{\mathcal {P}}_n={\mathcal {R}}^{*}_n =: \{j=1,\ldots ,p_n, \ { \text{ such } \text{ that } } X^j \in {\mathcal {R}}_n\} \end{aligned}$$

in (23), and taking into account that $\#{\mathcal {R}}_n=O(s_n)$ w.p.1., similar arguments as those in Aneiros and Vieu (2014) can be used to obtain that

$$\begin{aligned} \sum _{j\in {\mathcal {R}}^{*}_n} (\tilde{\gamma }^j -\gamma ^j)^2 = O_p(n^{-1}s_n^{2\theta }). \end{aligned}$$

(30)

On the other hand, if now one considers

$$\begin{aligned} {\mathcal {P}}_n={\mathcal {Q}}^{*}_n =: \{j=1,\ldots ,p_n, \ { \text{ such } \text{ that } } X^j \in {\mathcal {Q}}_n\} \end{aligned}$$

in (23) and applies both condition (22) and Lemma 2 in Aneiros and Vieu (2014), one gets

$$\begin{aligned} \sum _{j\notin {\mathcal {R}}^{*}_n, j \in {\mathcal {S}}_n} (\hat{\alpha }^j -\alpha ^j)^2= O_p(n^{-1}s_n^{2\theta }). \end{aligned}$$

(31)

Finally, (30) and (31) together with the fact that

$$\begin{aligned} ||\hat{\alpha } - \alpha ||^2 = \sum _{j\in {\mathcal {R}}^{*}_n} (\tilde{\gamma }^j -\gamma ^j)^2 + \sum _{j\notin {\mathcal {R}}^{*}_n, j \in {\mathcal {S}}_n} (\hat{\alpha }^j -\alpha ^j)^2 \end{aligned}$$

conclude the proof.$\square $

Proof of (28)

In a similar way as in Aneiros and Vieu (2014), considering ${\mathcal {P}}_n={\mathcal {R}}^{*}_n$ in (22) one obtains that

$$\begin{aligned} A_{n,1}=:P(\exists j\in {\mathcal {R}}^{*}_n, \gamma ^j\ne 0 { \text{ and } } \tilde{\gamma }^j =0) \rightarrow 0 { \text{ as } } n\rightarrow \infty \end{aligned}$$

(32)

and

$$\begin{aligned} A_{n,2}=:P(\exists j\in {\tilde{S}}_{{\mathcal {R}}^{*}_n} { \text{ such } \text{ that } } \gamma ^j = 0) \rightarrow 0 { \text{ as } } n\rightarrow \infty , \end{aligned}$$

(33)

where we have denoted $\tilde{S}_{{\mathcal {R}}^{*}_n} = \{j \in {\mathcal {R}}^{*}_n, \tilde{\gamma }^j \ne 0\}.$ In addition, from Lemma 2 in Aneiros and Vieu (2014) together with condition (22) applied when ${\mathcal {P}}_n={\mathcal {Q}}^{*}_n$ is considered, one gets that

$$\begin{aligned} A_{n,3}=:P(\exists j \notin {\tilde{S}}_{{\mathcal {R}}^{*}_n} { \text{ such } \text{ that } } \alpha ^j\ne 0 { \text{ and } } \tilde{\beta }^{k_j}= 0) \rightarrow 0 { \text{ as } } n \rightarrow \infty \end{aligned}$$

(34)

where, to avoid confusing the estimators in (32)–(33) and (34), notation $\beta $ was used instead of $\gamma $. Finally, the claimed result (28) follows directly from (32)–(34) together with the fact that

$$\begin{aligned} P(\hat{\mathcal {S}}_n\ne {\mathcal {S}}_n) \le A_{n,1} + A_{n,2} +A_{n,3}. \end{aligned}$$

$\square $

Proof of (29)

It is easy to obtain that

$$\begin{aligned} \left| \widehat{m}(z)-m(z)\right| \le \left| \widehat{m}^{*}(z)-m(z)\right| +(\#(\widehat{{\mathcal {S}}}_{n}\cup {\mathcal {S}}_{n}))^{1/2}(B_{n,1} + B_{n,2}), \end{aligned}$$

(35)

where we have denoted

$$\begin{aligned} B_{n,1}=\sup _{u\in \mathcal {C},j\in \widehat{{\mathcal {S}}}_{n}\cup {\mathcal {S}}_{n}}|\widehat{g}_{j}(u)-g_{j}(u)|\left\| \widehat{\varvec{\alpha }}-{\varvec{\alpha }}\right\| \end{aligned}$$

and

$$\begin{aligned} B_{n,2}=\sup _{u\in \mathcal {C},j\in \widehat{{\mathcal {S}}}_{n}\cup {\mathcal {S}}_{n}}|g_{j}(u)|\left\| \widehat{\varvec{\alpha }} - \varvec{\alpha }\right\| . \end{aligned}$$

In addition, from condition (3) and result (28) one has that

$$\begin{aligned} \#(\widehat{{\mathcal {S}}}_{n}\cup {\mathcal {S}}_{n})=O_p(s_n) \end{aligned}$$

(36)

and, using the condition (26), one gets

$$\begin{aligned} \sup _{u\in \mathcal {C},j\in \widehat{{\mathcal {S}}}_{n}\cup {\mathcal {S}}_{n}}|g_{j}(u)|=O_p(1). \end{aligned}$$

(37)

Finally, results (27) and (35)–(37) together with conditions (24) and (25) and the facts that $h\rightarrow 0$ and $b_n\rightarrow 0$ as $n \rightarrow \infty $ give the claimed result.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aneiros, G., Vieu, P. Partial linear modelling with multi-functional covariates. Comput Stat 30, 647–671 (2015). https://doi.org/10.1007/s00180-015-0568-8

Download citation

Received: 20 December 2013
Accepted: 11 February 2015
Published: 05 March 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s00180-015-0568-8

Keywords

JEL Classification

C14

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Partial linear modelling with multi-functional covariates

Abstract

Access this article

Similar content being viewed by others

Variable Selection in Semiparametric Bi-functional Models

Bayesian latent factor regression for multivariate functional data with variable selection

A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Remark 5

Remark 6

Theorem 1

Proof of (27)

Proof of (28)

Proof of (29)

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Partial linear modelling with multi-functional covariates

Abstract

Access this article

Similar content being viewed by others

Variable Selection in Semiparametric Bi-functional Models

Bayesian latent factor regression for multivariate functional data with variable selection

A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Remark 5

Remark 6

Theorem 1

Proof of (27)

Proof of (28)

Proof of (29)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation