Abstract
This paper takes part on the current literature on semi-parametric regression modelling for statistical samples composed of multi-functional data. A new kind of partially linear model (so-called MFPLR model) is proposed. It allows for more than one functional covariate, for incorporating as well continuous and discrete effects of functional variables and for modelling these effects as well in a nonparametric as in a linear way. Based on the continuous specificity of functional data, a new method is proposed for variable selection (so-called PVS method). In addition, from this procedure, new estimates of the various parameters involved in the partial linear model are constructed. A simulation study illustrates the finite sample size behavior of the PVS procedure for selecting the influential variables. Through some real data analysis, it is shown how the method is reaching the three main objectives of any semi-parametric procedure. Firstly, the flexibility of the nonparametric component of the model allows to get nice predictive behavior; secondly, the linear component of the model allows to get interpretable outputs; thirdly, the low computational cost insures an easy applicability. Even if the intent is to be used in multi-functional problems, it will briefly discuss how it can also be used in uni-functional problems as a boosting tool for improving prediction power. Finally, note that the main feature of this paper is of applied nature but some basic asymptotics are also stated in a final “Appendix”.
Similar content being viewed by others
References
Aneiros G, Cao R, Vilar-Fernández JM, Muñoz-San-Roque A (2013) Functional prediction for the residual demand in electricity spot markets. IEEE Trans Power Syst 28(4):4201–4208
Aneiros G, Ferraty F, Vieu P (2014) Variable selection in partial linear regression with functional covariate. Statistics. doi:10.1080/02331888.2014.998675
Aneiros-Pérez G, Vieu P (2006) Semi-functional partial linear regression. Stat Probab Lett 76(11):1102–1110
Aneiros-Pérez G, Vieu P (2011) Automatic estimation procedure in partial linear model with functional data. Stat Pap 52(4):751–771
Aneiros-Pérez G, Vieu P (2013) Testing linearity in semi-parametric functional data analysis. Comput Stat 28(2):413–434
Aneiros G, Vieu P (2014) Variable selection in infinite-dimensional problems. Stat Probab Lett 94:12–20
Chen D, Hall P, Müller HG (2011) Single and multiple index functional regression models with nonparametric link. Ann Stat 39(3):1720–1747
Cuevas A (2014) A partial overview pof the theory of statistics with functional data. J Stat Plann Inference 147:1–23
Du J, Zhang Z, Sun Z (2013) Variable selection for partially linear varying coefficient quantile regression model. Int J Biomath 6(3):14
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fan J, Peng H (2004) Nonconcave penalized likelihood with a diverging number of parameters. Ann Stat 32:928–961
Ferraty F, Goia A, Salinelli E, Vieu P (2013) Functional projection pursuit regression. Test 22:293–320
Ferraty F, Hall P, Vieu P (2010) Most-predictive design points for functional data predictors. Biometrika 97:807–824
Ferraty F, Laksaci A, Tadj A, Vieu P (2010) Rate of uniform consistency for nonparametric estimates with functional variables. J Stat Plann Inference 140:335–352
Ferraty F, Park J, Vieu P (2011) Estimation of a functional single index model. In: Recent advances in functional data analysis and related topics, Contrib Stat Springer, Heidelberg, pp 111–116
Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York
Gertheiss J, Maity A, Staicu AM (2013) Variable selection in generalized functional linear models. Stat 2:86–101
Goia A, Vieu P (2014) Some advances on semi-parametric functional data modelling. In: Contributions in infinite-dimensional statistics and related topics, Esculapio, Bologna
Goia A, Vieu P (2014) A partitioned single functional index model. Comput Stat. doi:10.1007/s00180-014-0530-1
Goldsmith J, Bobb J, Crainiceanu C, Caffo B, Reich D (2011) Penalized functional regression. J Comput Graph Stat 20:830851
Guo J, Tang M, Tian M, Zhu K (2013) Variable selection in high-dimensional partially linear additive models for composite quantile regression. Comput Stat Data Anal 65:56–67
Härdle W, Liang H, Gao J (2000) Partially linear models. Physica-Verlag, Heidelberg
Härdle W, Liang H (2007) Statistical methods for biostatistics and related fields. Springer, Berlin, pp 87–103
Hong Z, Hu Y, Lian H (2013) Variable selection for high-dimensional varying coefficient partially linear models via nonconcave penalty. Metrika 76(7):887–908
Hu Y, Lian H (2013) Variable selection in a partially linear proportional hazards model with a diverging dimensionality. Stat Probab Lett 83(1):61–69
Huang J, Xie H (2007) Asymptotic oracle properties of SCAD-penalized least squared estimators. Asymptotics: particles, processes and inverse problems. In: IMS Lecture Notes-Monograph Series. 55, pp 149–166
Hunter DR, Li RA (2005) Variable selection using MM algorithms. Ann Stat 33(4):1617–1642
Kneip A, Poss D, Sarda P. Functional linear regression with points of impact. (Preprint)
Lian H (2011) Functional partial linear model. J Nonparametr Stat 23(1):115–128
Liang H, Härdle W, Carroll RJ (1999) Estimation in a semiparametric partially linear errors-in-variables model. Ann Stat 27(5):1519–1535
Maity A, Huang JZ (2012) Partially linear varying coefficient models stratified by a functional covariate. Stat Probab Lett 82(10):1807–1814
McKeague IW, Sen B (2010) Fractals with point impact in functional linear regression. Ann Stat 38:2559–2586
Ni X, Zhang HH, Zhang D (2009) Automatic model selection for partially linear models. J Multivar Anal 100:2100–2111
Pateiro-López B, González-Manteiga W (2006) Multivariate partially linear models. Stat Probab Lett 76:1543–1549
Rachdi M, Vieu P (2007) Nonparametric regression for functional data: automatic smoothing parameter selection. J Stat Plann Inference 137(9):2784–2801
Robinson PM (1988) Root-n-consistent semiparametric regression. Econometrica 56(4):931–954
Speckman P (1988) Kernel smoothing in partial linear models. J R Stat Soc Ser B 50(3):413–436
Wang H, Zou G, Wan A (2013) Adaptive LASSO for varying-coefficient partially linear measurement error models. J Stat Plann Inference 143(1):40–54
Xia Y, Härdle W (2006) Semi-parametric estimation of partially linear single-index models. J Multivar Anal 97(5):1162–1184
Xie H, Huang J (2009) SCAD-penalized regression in high-dimensional partially linear models. Ann Stat 37(2):673–696
Zhang J, Wang T, Zhu L, Liang H (2012) A dimension reduction based approach for estimating and variable selection in partially linear single-index models with high-dimensional covariates. Electron J Stat 6:2235–2273
Zhang R, Zhao W, Liu J (2013) Robust estimation and variable selection for semiparametric partially linear varying coefficient model based on modal regression. J Nonparametr Stat 25(2):523–544
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533
Acknowledgments
The authors wish to express their great gratitude to the Editors and the Reviewers who have provided very interesting comments. Their suggestions were of great help when revising this work and will certainly increase its impact. In particular the reviewing procedure has been the opportunity to have highly interesting cross exchanges with one Referee about the interest and the meaning of our model (see Remark 2), which have greatly contributed to improve this work, at least by pointing our some challenging issues for the future. The research of G. Aneiros was partially supported by Grants MTM2011-22392 and CN2012/130 from Spanish Ministerio de Economía y Competitividad and Xunta Galicia, respectively.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
-
(A)
Conditions on the model. In addition to both the general conditions (1), (2), (3), (4) and (7) and the specific ones (5) and (6), minor additional technical assumptions on the model include regularity on the observed grid of the curve \(\chi \):
$$\begin{aligned} \exists c_1,c_2, \forall j=1,\ldots ,p_n -1, 0<c_1p_n^{-1}<t_{j+1}-t_j<c_2p_n^{-1}< \infty , \end{aligned}$$(18)as well as smoothness and boundedness conditions on \(\chi \):
$$\begin{aligned} \chi { \text{ is } \text{ Lipschitz } \text{ continuous } \text{ on } \text{ its } \text{ support }} \end{aligned}$$(19)and
$$\begin{aligned} \exists \eta , \ \forall t\in [C,D], \ |\chi (t)|\ge \eta >0. \end{aligned}$$(20) -
(B)
Conditions on the linear estimate and the variable selection. Let us consider the partial linear regression model
$$\begin{aligned} Y=\sum _{j \in {\mathcal {P}}_n} \gamma ^jX^j + m(\zeta ) + \epsilon , \end{aligned}$$(21)where \({\mathcal {P}}_n \subset \{1,\ldots ,p_n\}\) with \(\#{\mathcal {P}}_n=O(\omega _n)\) or \(\#{\mathcal {P}}_n=O(s_n)\). Let us assume that the standard SCAD-penalized least squares procedure leads to estimates \(\tilde{\gamma }^j\) of \(\gamma ^j\) satisfying the following properties:
$$\begin{aligned} P\Bigl (\{{j \in {\mathcal {P}}_n},\gamma ^j=0\}=\{{j \in {\mathcal {P}}_n},\tilde{\gamma }^j=0\} \Bigr ) \rightarrow 1 { \text{ as } } n \rightarrow \infty \end{aligned}$$(22)and
$$\begin{aligned} \exists \theta \ge 0,\quad ||\tilde{\gamma } - \gamma || = O_p (n^{-1/2}(\#{\mathcal {P}}_n)^\theta ), \end{aligned}$$(23)where we have denoted \(\gamma =(\gamma ^j,j\in {\mathcal {P}}_n)^{\prime }\).
Remark 5
As noted in Remark 3, suitable conditions under which (22) and (23) hold can be seen in Aneiros et al. (2014).
-
(C)
Conditions on the nonparametric estimate. Let us consider the nonparametric models
$$\begin{aligned} Y^{*}=m(\zeta ) + \epsilon \end{aligned}$$and
$$\begin{aligned} X^{j}=g_j(\zeta ) + \eta _j,\quad \ j=1,\ldots ,p_n, \end{aligned}$$and let \(\widehat{m}^{*}(z)\) and \(\widehat{g}_{j}(z)\) denote the corresponding nonparametric estimates for \(m(z)\) an \(g_j(z)\), respectively, obtained from the models above by using the same kind of weights as those used in the PVS procedure. Let us assume that
$$\begin{aligned}&\sup _{z\in \mathcal {C}}\left| \widehat{m}^{*}(z)-m(z)\right| =O_{p}\left( a_n \right) , \end{aligned}$$(24)$$\begin{aligned}&\sup _{z\in \mathcal {C}, j\in {\mathcal {S}}_{n}}|\widehat{g} _{j}(z)-g_{j}(z)|=O_{p}\left( b_n \right) \end{aligned}$$(25)and
$$\begin{aligned} \sup _{z\in \mathcal {C},j\in {\mathcal {S}}_{n}}|g_{j}(z)|=O\left( 1\right) . \end{aligned}$$(26)
Remark 6
Specific rates of convergence for nonparametric estimators with functional covariate can be seen in Ferraty et al. (2010).
-
(D)
Some asymptotics.
Theorem 1
Under conditions (1)–(7) and (18)–(23), and if \(\omega _n\rightarrow \infty \) as \(n \rightarrow \infty \), one has
and
If in addition conditions (24), (25) and (26) hold, and \(h\rightarrow 0\) and \(b_n \rightarrow 0\) as \(n \rightarrow \infty \), then
-
(E)
Some remarks. It is illustrative to compare the rates obtained in Theorem 1 with those corresponding to some standard procedure. To the best of our knowledge, at this moment Aneiros et al. (2014) is the only paper devoted to variable selection in the context of partial linear models with functional covariate in the nonparametric component. Thus, we are required to compare the PVS procedure with the proposal in Aneiros et al. (2014). For that, the first thing to do is to be sure that (22) and (23) hold in the specific setting where the covariates \(X^j\) come from a curve. If, in such setting, one restricts to fixed \(s_n=s<\infty \), the conditions imposed in Aneiros et al. (2014) are trivially fulfill and (22) and (23) (considering \(\theta =0\)) hold. Thus, the PVS and the standard SCAD-penalized least squares estimators of the parametric component \(\alpha \) converge at the same rate (\(O_p(n^{-1/2})\)). In addition, the corresponding nonparametric estimators of the functional \(m\) achieve the common uniform rate
$$\begin{aligned} O_{p}\left( h^{\delta }+\sqrt{1/(n\Phi _{n})}\right) \end{aligned}$$where we have denoted \(\Phi _{n}=\phi (h)/\psi _{\mathcal {C}}\left( 1/n\right) \) with \(\phi (\cdot )\) and \(\psi _{\mathcal {C}}(\cdot )\) being the small ball probability function and the Kolmogorov entropy associated to \(\mathcal {C}\), respectively, corresponding to the functional variable \(\zeta \), and \(\delta \) denotes the order of a Hölder condition imposed on the various smooth functions associated to the partial linear regression model. Finally, it is worth to be noted that, to obtain better rates for the PVS procedure than for the standard one, more research is needed. Basically, it is sufficient to satisfy (22) and (23) for some \(\theta >0\), assuming covariates coming from a curve.
-
(F)
Sketch form of the proofs. Proofs of both (27) and (28) can be obtained by using the same techniques as those in Aneiros and Vieu (2014) while, by means of (27) and (28), it is easy to get (29). Thus, in order to save space, only sketches of the corresponding proofs are presented here.
Proof of (27)
Considering
in (23), and taking into account that \(\#{\mathcal {R}}_n=O(s_n)\) w.p.1., similar arguments as those in Aneiros and Vieu (2014) can be used to obtain that
On the other hand, if now one considers
in (23) and applies both condition (22) and Lemma 2 in Aneiros and Vieu (2014), one gets
Finally, (30) and (31) together with the fact that
conclude the proof.\(\square \)
Proof of (28)
In a similar way as in Aneiros and Vieu (2014), considering \({\mathcal {P}}_n={\mathcal {R}}^{*}_n\) in (22) one obtains that
and
where we have denoted \(\tilde{S}_{{\mathcal {R}}^{*}_n} = \{j \in {\mathcal {R}}^{*}_n, \tilde{\gamma }^j \ne 0\}.\) In addition, from Lemma 2 in Aneiros and Vieu (2014) together with condition (22) applied when \({\mathcal {P}}_n={\mathcal {Q}}^{*}_n\) is considered, one gets that
where, to avoid confusing the estimators in (32)–(33) and (34), notation \(\beta \) was used instead of \(\gamma \). Finally, the claimed result (28) follows directly from (32)–(34) together with the fact that
\(\square \)
Proof of (29)
It is easy to obtain that
where we have denoted
and
In addition, from condition (3) and result (28) one has that
and, using the condition (26), one gets
Finally, results (27) and (35)–(37) together with conditions (24) and (25) and the facts that \(h\rightarrow 0\) and \(b_n\rightarrow 0\) as \(n \rightarrow \infty \) give the claimed result.\(\square \)
Rights and permissions
About this article
Cite this article
Aneiros, G., Vieu, P. Partial linear modelling with multi-functional covariates. Comput Stat 30, 647–671 (2015). https://doi.org/10.1007/s00180-015-0568-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-015-0568-8
Keywords
- Semi-parametrics
- Functional data analysis
- Multi-functional covariates
- Partial linear model
- Variable selection