Advertisement

TEST

, Volume 23, Issue 4, pp 806–843 | Cite as

A simultaneous confidence corridor for varying coefficient regression with sparse functional data

  • Lijie Gu
  • Li Wang
  • Wolfgang K. Härdle
  • Lijian Yang
Original Paper

Abstract

We consider a varying coefficient regression model for sparse functional data, with time varying response variable depending linearly on some time-independent covariates with coefficients as functions of time-dependent covariates. Based on spline smoothing, we propose data-driven simultaneous confidence corridors for the coefficient functions with asymptotically correct confidence level. Such confidence corridors are useful benchmarks for statistical inference on the global shapes of coefficient functions under any hypotheses. Simulation experiments corroborate with the theoretical results. An example in CD4/HIV study is used to illustrate how inference is made with computable p values on the effects of smoking, pre-infection CD4 cell percentage and age on the CD4 cell percentage of HIV infected patients under treatment.

Keywords

B spline Confidence corridor Karhunen–Loève \(L^{2}\) representation Knots Functional data Varying coefficient 

Mathematics Subject Classification (2000)

62G08 62G15 62G32 

1 Introduction

Functional data, known also as “curve data”, are commonly encountered in biomedical studies, epidemiology and social science, where information is collected over a time period for each subject. Conceptually, such data can be viewed as a simple random sample from the abstract space of functions, see for instance, Ferraty and Vieu (2006), Manteiga and Vieu (2007). For functional data analysis (FDA) approach without nonparametric smoothing, see Gabrys et al. (2010), and the recent comprehensive review in Horváth and Kokoszka (2012). In this, paper we have taken from Ramsay and Silverman (2005) the more convenient view of functional data as discretely recorded observations of independent stochastic processes contaminated with measurement errors.

In many longitudinal studies, repeated measurements are often collected at finite number of time points. If the time points of observation for every subjects are dense and regular, the data are termed dense functional data, see Cao et al. (2012a, b), and Zhu et al. (2012) for theoretical development and real examples of dense functional data. If, on the other hand, data collection is made at few and irregular time points for each subject, the data are frequently referred to as sparse longitudinal or sparse functional data, see James et al. (2000), James and Sugar (2003), Yao et al. (2005a), Hall et al. (2006), Zhou et al. (2008), Ma et al. (2012) for works on sparse functional data. It should be emphasized especially that by “sparse” we mean that the covariate is observed sparsely over a compact interval, not having anything to do with sparsity used in variable selection context such as the popular LASSO method. A crucial condition for sparse FDA is that the time points from all subjects are dense in the data range despite sparsity for any individual subject, see Assumption (A3) in Appendix A that the design density \(f(t)\) has a positive lower bound \(c_{f}\), which implies that the sampling frequency is almost uniform for the time covariate.

In longitudinal study, often, interest lies in studying the association between the covariates and the response variable. In recent years, there has been an increasing interest in nonparametric analysis of longitudinal data to enhance flexibility, see e.g., Yao and Li (2013). The varying coefficient model (VCM) proposed by Hastie and Tibshirani (1993) strikes a delicate balance between the simplicity of linear regression and the flexibility of multivariate nonparametric regression and has been widely applied in various settings, for instance, the Cobb–Douglas model for GDP growth in Liu and Yang (2010), and the longitudinal model for CD4 cell percentages in AIDS patients in Wu and Chiang (2000), Fan and Zhang (2000) and Wang et al. (2008). See Fan and Zhang (2008) for an extensive literature review of VCM.

To examine whether the association changes over time, Hoover et al. (1998) proposed the following VCM
$$\begin{aligned} Y(t)=\beta _{0}(t)+\mathbf {X}(t)^{\scriptstyle {\mathsf {T}}}\varvec{\beta } (t)+\varepsilon (t),\quad t\in \mathcal {T}, \end{aligned}$$
(1)
where \(\mathbf {X}(t) = (X_{1}(t),\ldots ,X_{d}(t))^{\scriptstyle {\mathsf {T}}}\) are covariates at time \(t\), \(\varepsilon (t)\) is a mean zero process, and \( \varvec{\beta }(t) =(\beta _{1}(t),\ldots ,\beta _{d}(t))^{\scriptstyle { \mathsf {T}}}\) are functions of \(t\). Model (1) is a special case of functional linear models, see Ramsay and Silverman (2005) and Wu et al. (2010).

The coefficient functions \(\beta _{l}(t)\)s in model (1) can be estimated by, for example, kernel method in Hoover et al. (1998), basis function approximation method in Huang et al. (2002), polynomial spline method in Huang et al. (2004) and smoothing spline method in Brumback and Rice (1998). Fan and Zhang (2000) proposed a two-step method to overcome the computational burden of the smoothing spline method.

For some longitudinal studies, the covariates are independent of time, and their observations are cross-sectional. Take for instance the longitudinal CD4 cell percentage data among HIV seroconverters. This dataset contains 1,817 observations of CD4 cell percentages on 283 homosexual men infected with the HIV virus. Three of the covariates are observed at the time of HIV infection and hence by nature independent of the measurement time and frequency: \(X_{i1}\), the \(i\)th patient’s smoking status; \(X_{i2}\), the \(i\)th patient’s centered pre-infection CD4 percentage; and \(X_{i3}\) the \(i\)th patient’s centered age at the time of HIV infection. A fourth predictor, however, is time dependent: \(T_{ij}\), the time (in years) of the \(j\)th measurement of CD4 cell on the \(i\)th patient after HIV infection; while the response \(Y_{ij}\) is also time dependent: the \(j\)th measurement of the \(i\)th patient’s CD4 cell percentage at time \(T_{ij}\). Wu and Chiang (2000), Fan and Zhang (2000) and Wang et al. (2008) all contain detailed descriptions and analysis of this dataset.

A feasible VCM for multivariate functional data such as the above takes the form
$$\begin{aligned} Y_{ij}=\sum _{l=1}^{d}\eta _{il}\left( T_{ij}\right) X_{il}+\sigma \left( T_{ij}\right) \varepsilon _{ij},\ \ 1\le i\le n,\ 1\le j\le N_{i}, \end{aligned}$$
(2)
where the measurement errors \(\left( \varepsilon _{ij}\right) _{i=1,j=1}^{n,N_{i}}\) satisfy \({{\mathsf {E}}}\left( \varepsilon _{ij}\right) =0\), \({{\mathsf {E}}}(\varepsilon _{ij}^{2})=1\), and \( \left\{ \eta _{il}(t),t\in \mathcal {T}\right\} \) are i.i.d copies of a \( L^{2} \) process \(\left\{ \eta _{l}(t),t\in \mathcal {T}\right\} \), i.e., \( {{\mathsf {E}}}\int \nolimits _{\mathcal {T}}\eta _{l}^{2}(t)\mathrm{d}t<+\infty \), \( l=1,\ldots ,d\). The common mean function of processes \(\left\{ \eta _{l}(t),t\in \mathcal {T}\right\} \) is denoted as \(m_{l}(t)= {{\mathsf {E}}}\{\eta _{l}(t)\}\), \(l=1,\ldots ,d\). The actual data set consists of \( \left\{ \mathbf {X}_{i},T_{ij},Y_{ij}\right\} \), \(1\le i\le n\), \(1\le j\le N_{i}\), in which the \(i\)th subject is observed \(N_{i}\) times, the time-independent covariates for the \(i\)th subject are \(\mathbf {X} _{i}=\left( X_{il}\right) _{l=1}^{d}\), \(1\le i\le n\), and the random measurement time \(T_{ij}\in \mathcal {T}=\left[ a,b\right] \). The aforementioned data example is called sparse functional as the number of measurements \(N_{i}\) for the \(i\)th subject is relatively low. (In the above CD4 example actually at most \(14\)). In contrast, for a dense functional data \(\lim _{n\rightarrow \infty }\min _{1\le i\le n}N_{i}=\infty \).

For the CD4 cell percentage data, we introduce a fourth time-independent covariate, the baseline \(X_{i0}\equiv 1\), and denote by \(m_{l}\left( t\right) \), \(l=0,1,2,3\), the coefficient functions for baseline CD4 percentage, smoking status, centered pre-infection CD4 percentage and centered age, respectively. Figures 2, 3, 4, 5 contain spline estimates of the \(m_{l}\left( t\right) \), \(0\le l\le 3\), and simultaneous confidence corridors (SCC) at various confidence levels.

In previous works the theoretical focus has mainly been on consistency and asymptotic normality of the estimators of the coefficient functions of interest, and the construction of pointwise confidence intervals. However, as demonstrated in Fan and Zhang (2000), this is unsatisfactory as investigators are often interested in testing whether some coefficient functions are significantly nonzero or varying, for which a SCC is needed. Take for instance, Fig. 3, which shows both the 95 and 20.277 % SCC of \(m_{1}\left( t\right) \) contain the zero line completely, thus with a very high p value of \(0.79723\) the null hypothesis of \(m_{1}\left( t\right) \equiv 0,t\in \mathcal {T}\) is not rejected. More details are in Sect. 6.

Construction of computationally simple SCCs with exact coverage probability is known to be difficult even with independent cross-sectional data; see, Wang and Yang (2009) and related earlier work Härdle and Luckhaus (1984) on uniform consistency. Most earlier methods proposed in the literature restrict to asymptotic conservative SCCs. Wu et al. (1998) developed asymptotic SCCs for the unknown coefficients based on local polynomial methods, which are computationally intensive, as the kernel estimator requires solving an optimization problem at every point. Huang et al. (2004) proposed approximating each coefficient function by a polynomial spline and developed spline SCCs, which are simpler to construct, while Xue and Zhu (2007) proposed maximum empirical likelihood estimators and constructed SCCs for the coefficient functions. All these SCCs are Bonferroni-type variability bands according to Hall and Titterington (1988). The idea is to invoke pointwise confidence intervals on a very fine grid of \( [a,b]\), then adjust the level of these confidence intervals by the Bonferroni method to obtain uniform confidence bands, and finally bridge the gaps between the grid points via smoothness conditions on the coefficient curve. However, to use these bands in practice, one must have a priori bounds on the magnitude of the bias on each subinterval as well as a choice for the number of grid points. Chiang et al. (2001) proposed a bootstrap procedure to construct confidence intervals. However, theoretical properties of their procedures have not yet been developed.

In this paper, we derive SCCs with exact coverage probability for the coefficient functions \(m_{l}(t),\) \(l=1,\ldots ,d\), in (3) via extreme value theory of Gaussian processes and approximating coefficient functions by piecewise-constant splines. The results represent the first attempt at developing exact SCCs for the coefficient functions in VCM for sparse functional data. Our simulation studies indicate the proposed SCCs are computationally efficient and have the right coverage probability for finite samples. Our work parallels Zhu et al. (2012) which established asymptotic theory of SCC in the case of VCM for dense functional data. It is important to mention as well that the linear covariates in Zhu et al. (2012) are time dependent, which does not complicate the problem as they work with dense data instead of the sparse data we concentrate on. Our work can also be viewed as an extension of the univariate longitudinal regression in Ma et al. (2012) to varying coefficient regression, the latter corresponds exactly to the special case of \(d=1, X_{i1}\equiv 1\). Theorem 1 of Ma et al. (2012) follows from Theorems 1 and 2 in this paper with some slight modifications.

We organize our paper as follows. Section 2 describes spline estimators, and establish their asymptotic properties for sparse longitudinal data. Section 3.1 proposes asymptotic pointwise confidence intervals and SCCs constructed from piecewise constant splines. Section 3.2 describes actual steps to implement the proposed SCCs. In Sect. 4, we provide further insights into the estimation error structure of spline estimators. Section 5 reports findings from a simulation study. A real data example appears in Sect. 6. Proofs of technical lemmas are in Appendix A.

2 Spline estimation and asymptotic properties

For a functional data \(\left\{ \mathbf {X}_{i},T_{ij},Y_{ij}\right\} \), \( 1\le i\le n\), \(1\le j\le N_{i}\), denote the eigenvalues and eigenfunctions sequences of its covariance operator \(G_{l}\left( s,t\right) = {\mathrm{cov}}\left\{ \eta _{l}(s),\eta _{l}(t)\right\} \) as \(\left\{ \lambda _{k,l}\right\} _{k=1}^{\infty }\), \(\left\{ \psi _{k,l}(t)\right\} _{k=1}^{\infty }\), in which \(\lambda _{1,l}\ge \lambda _{2,l}\ge \cdots \ge 0\), \(\sum _{k=1}^{\infty }\lambda _{k,l}<\infty \), and \(\left\{ \psi _{k,l}\right\} _{k=1}^{\infty }\) form an orthonormal basis of \(L^{2}\left( \mathcal {T}\right) \). It follows from spectral theory that \(G_{l}\left( s,t\right) =\sum _{k=1}^{\infty }\lambda _{k,l}\psi _{k,l}(s)\psi _{k,l}\left( t\right) \). For any \(l=1,\ldots ,d\), the \(i\)th trajectory \(\left\{ \eta _{il}(t),t\in \mathcal {T}\right\} \) allows the Karhunen–Loève \(L^{2}\) representation (Yao et al. 2005b): \(\eta _{il}(t)=m_{l}(t)+\sum _{k=1}^{\infty }\xi _{ik,l}\phi _{k,l}(t)\), where the random coefficients \(\xi _{ik,l}\) are uncorrelated with mean \(0\) and variances \(1\), and the functions \(\phi _{k,l}=\sqrt{\lambda _{k,l}}\psi _{k,l}\), thus \(G_{l}(s,t)=\sum _{k=1}^{\infty }\phi _{k,l}(s)\phi _{k,l}\left( t\right) \), and the response measurements (2) can be represented as follows:
$$\begin{aligned} Y_{ij}=\sum _{l=1}^{d}m_{l}\left( T_{ij}\right) X_{il}+\sum _{l=1}^{d}\sum _{k=1}^{\infty }\xi _{ik,l}\phi _{k,l}\left( T_{ij}\right) X_{il}+\sigma \left( T_{ij}\right) \varepsilon _{ij}. \end{aligned}$$
(3)
Without loss of generality, we take \(\mathcal {T}=\left[ a,b\right] \) to be \( [0,1]\). Following Xue and Yang (2006), we approximate each coefficient function by the spline smoothing method. To describe the spline functions, one can divide the finite interval \([0,1]\) into (\(N_{{\mathrm{s}}}+1\)) equal subintervals \(\chi _{J}=\left[ \upsilon _{J},\upsilon _{J+1}\right) \), \( J=0,\ldots ,N_{{\mathrm{s}}}-1,\chi _{N_{{\mathrm{s}}}}=\left[ \upsilon _{N_{{\mathrm{s}} }},1\right] \). A sequence of equally spaced points \(\left\{ \upsilon _{J}\right\} _{J=1}^{N_{{\mathrm{s}}}}\), called interior knots, are given as \( \upsilon _{0}=0<\upsilon _{1}<\cdots <\upsilon _{N_{{\mathrm{s}}}}<1=\upsilon _{N_{{\mathrm{s}}}+1}.\) Let \(\upsilon _{J}=Jh_{{\mathrm{s}}}\) for \(0\le J\le N_{ {\mathrm{s}}}+1\), where \(h_{{\mathrm{s}}}=1/\left( N_{{\mathrm{s}}}+1\right) \) is the distance between neighboring knots. We denote by \(G^{\left( -1\right) }=G^{\left( -1\right) }\left[ 0,1\right] \) the space of functions that are constant on each subinterval \(\chi _{J}\), and the B-spline basis of \( G^{\left( -1\right) }\), as \(\left\{ b_{J}(t)\right\} _{J=0}^{N_{{\mathrm{s}}}}\), which are simply indicator functions of intervals \(\chi _{J}\), \( b_{J}(t)=I_{\chi _{J}}(t)\), \(J=0,1,\ldots ,N_{{\mathrm{s}}}\). For any \(t\in \left[ 0,1\right] \), define its location index as \(J(t)=J_{n}(t)=\min \left\{ \left[ t/h_{{\mathrm{s}}}\right] ,N_{{\mathrm{s}}}\right\} \) so that \(t\in \chi _{J(t)}\).
Next we define the space of spline coefficient functions on \(\mathcal {T} \times \mathbb {R}^{d}\) as
$$\begin{aligned} \mathcal {M}=\left\{ g\left( t,\mathbf {x}\right) =\sum _{l=1}^{d}g_{l}(t)x_{l}:g_{l}(t)\in G^{\left( -1\right) },t\in \mathcal { T},\mathbf {x}=\left( x_{1},\ldots ,x_{d}\right) ^{\scriptstyle {\mathsf {T}} }\in \mathbb {R}^{d}\right\} , \end{aligned}$$
and propose estimating the multivariate function \(\sum \nolimits _{l=1}^{d}m_{l}(t)x_{l}\) by
$$\begin{aligned} \hat{m}\left( t,\mathbf {x}\right) =\sum _{l=1}^{d}\hat{m}_{l}(t)x_{l}= \underset{g\in \mathcal {M}}{{\mathrm{argmin}}}\sum _{i=1}^{n}\sum \limits _{j=1}^{N_{i}}\left\{ Y_{ij}-g\left( T_{ij},\mathbf {X}_{i}\right) \right\} ^{2}. \end{aligned}$$
(4)
Let \(\sigma _{Y}^{2}(t,\mathbf {x})\) be the conditional variance of \(\mathbf {Y }\) given \(T=t\) and \(\mathbf {X}=\mathbf {x}=\left( x_{1},\ldots ,x_{d}\right) ^{\scriptstyle {\mathsf {T}}}\in \mathbb {R}^{d}\)
$$\begin{aligned} \sigma _{Y}^{2}(t,\mathbf {x})={\mathrm {\mathsf {Var}}}(Y\left| T=t, \mathbf {X=x}\right. )=\sum _{l=1}^{d}G_{l}\left( t,t\right) x_{l}^{2}+\sigma ^{2}(t). \end{aligned}$$
Next for any \(t\in \left[ 0,1\right] \), let
$$\begin{aligned} \mathbf {\Gamma }_{n}(t)&= c_{J(t),n}^{-2}\{n{{\mathsf {E}}} (N_{1})\}^{-1}{{\mathsf {E}}}\mathbf {XX}^{\scriptstyle {\mathsf {T}}} \left[ \int \nolimits _{\chi _{J(t)}}\sigma _{Y}^{2}\left( u,\mathbf {X}\right) f\left( u\right) \mathrm{d}u\right. \nonumber \\&\left. +\,\frac{{{\mathsf {E}}}\left\{ N_{1}(N_{1}\!-\!1)\right\} }{ {{\mathsf {E}}}N_{1}}\sum \limits _{l=1}^{d}X_{l}^{2}\int \nolimits _{\chi _{J(t)}\times \chi _{J(t)}}G_{l}\left( u,v\right) f\left( u\right) f\left( v\right) \mathrm{d}u\mathrm{d}v\right] \!, \end{aligned}$$
(5)
where
$$\begin{aligned} c_{J,n}={{\mathsf {E}}}b_{J}^{2}(T)=\int \nolimits _{0}^{1}b_{J}^{2}(t)f(t)\mathrm{d}t, \quad J=0,\ldots ,N_{{\mathrm{s}}}. \end{aligned}$$
(6)
Further denote
$$\begin{aligned} \mathbf {\Sigma }_{n}(t)=\mathbf {H}^{-1}\mathbf {\Gamma }_{n}(t)\mathbf {H} ^{-1}=\left\{ \sigma _{n,ll^{\prime }}^{2}(t)\right\} _{l,l^{\prime }=1}^{d}\!, \end{aligned}$$
(7)
where \(\sigma _{n,ll^{\prime }}^{2}(t)\) are later shown to be the asymptotic covariances between \(\hat{m}_{l}(t)\) and \(\hat{m}_{l^{\prime }}(t)\).

Theorem 1

Under Assumptions (A1)–(A6) in Appendix A, for any \( t\in \left[ 0,1\right] \), as \(n\rightarrow \infty \),
$$\begin{aligned} \mathbf {\Sigma }_{n}^{-1/2}\left( t\right) \{\hat{\mathbf {m}}(t)-\mathbf {m} \left( t\right) \}\overset{\mathcal {L}}{\longrightarrow }N\left( \mathbf {0}, \mathbf {I}_{d\times d}\right) \!, \end{aligned}$$
where \(\hat{\mathbf {m}}(t)=\left( \hat{m}_{1}(t),\ldots ,\hat{m} _{d}(t)\right) ^{\scriptstyle {\mathsf {T}}}\) is the estimate of \(\mathbf {m} (t)=\left( {m}_{1}(t),\ldots ,{m}_{d}(t)\right) ^{\scriptstyle {\mathsf {T}}}.\) Furthermore, for any \(l=1,\ldots ,d\) and \(\alpha \in \left( 0,1\right) ,\)
$$\begin{aligned} \lim _{n\rightarrow \infty }P\left\{ \sigma _{n,ll}^{-1}(t)\left| \hat{m} _{l}(t)-m_{l}(t)\right| \le Z_{1-\alpha /2}\right\} =1-\alpha . \end{aligned}$$

Remark 1

Note that \(\mathbf {\Sigma }_{n}(t)=\left\{ \sigma _{n,ll^{\prime }}^{2}(t)\right\} _{l,l^{\prime }=1}^{d}\) in (7) is complicated to compute in practice. The next proposition suggests that, for any \(t\in \left[ 0,1\right] \), \(\mathbf {\Gamma }_{n}(t)\) in (5 ) can be simplified by
$$\begin{aligned} \tilde{\mathbf {\Gamma }}_{n}(t)\equiv {{\mathsf {E}}}\left[ \mathbf {XX} ^{\scriptstyle {\mathsf {T}}}\frac{\sigma _{Y}^{2}(t,\mathbf {X})}{f(t)h_{{\mathrm{ s}}}n{{\mathsf {E}}}(N_{1})}\left\{ 1+\frac{{{\mathsf {E}}} N_{1}\left( N_{1}-1\right) }{{{\mathsf {E}}}N_{1}}\frac{ \sum _{l=1}^{d}X_{l}^{2}G_{l}\left( t,t\right) f(t)h_{{\mathrm{s}}}}{\sigma _{Y}^{2}(t,\mathbf {X})}\right\} \right] . \end{aligned}$$
(8)

Denote the supremum of a function \(\phi \) on \(\left[ a,b\right] \) by \( \left\| \phi \right\| _{\infty }=\sup _{t\in \left[ a,b\right] }\left| \phi (t)\right| \). For any matrix \(\mathbf {A}=\left( a_{ij}\right) \), define \(\left\| \mathbf {A}\right\| _{\infty }=\max \left| a_{ij}\right| \), where the maximum is taken over all the elements of \(\mathbf {A}\), while for a matrix function \(\mathbf {A}(t)=\left( a_{ij}(t)\right) \), \(\left\| \mathbf {A}\right\| _{\infty }=\sup _{t\in \left[ a,b\right] }\left\| \mathbf {A}(t)\right\| _{\infty }\).

Proposition 1

Under Assumptions (A2)–(A6) in Appendix A, there exists a constant \(c>0\) such that as \(n\rightarrow \infty \), \(\Vert \mathbf {\Gamma }_{n}(t)-\tilde{\mathbf {\Gamma }}_{n}(t)\Vert _{\infty }=\mathcal {O} \left( n^{-1}h_{{\mathrm{s}}}^{r-1}\right) =\mathcal {O}\left( n^{-c}\right) .\)

To derive the maximal deviation distribution of estimators \(\hat{m}_{l}(t)\), \(l=1,\ldots ,d\), let
$$\begin{aligned} Q_{N_{{\mathrm{s}}}+1}\left( \alpha \right) =b_{N_{{\mathrm{s}}}+1}-a_{N_{{\mathrm{s}} }+1}^{-1}\log \left\{ -\frac{1}{2}\log (1-\alpha )\right\} ,\quad \alpha \in \left( 0,1\right) \end{aligned}$$
(9)
$$\begin{aligned} a_{N_{{\mathrm{s}}}+1}=\left\{ 2\log \left( N_{{\mathrm{s}}}+1\right) \right\} ^{1/2},\quad b_{N_{{\mathrm{s}}}+1}=a_{N_{{\mathrm{s}}}+1}-\frac{ \log \left( 2\pi a_{N_{{\mathrm{s}}}+1}^{2}\right) }{2a_{N_{{\mathrm{s}} }+1}}. \end{aligned}$$
(10)

Theorem 2

Under Assumptions (A1)–(A6) in Appendix A, for \( l=1,\ldots ,d\) and any \(\alpha \in \left( 0,1\right) \),
$$\begin{aligned} \lim _{n\rightarrow \infty }P\left\{ \sup _{t\in [0,1]}\sigma _{n,ll}^{-1}(t)\left| \hat{m}_{l}(t)-m_{l}(t)\right| \le Q_{N_{ {\mathrm{s}}}+1}\left( \alpha \right) \right\} =1-\alpha , \end{aligned}$$
where \(\sigma _{n,ll}(t)\) and \(Q_{N_{{\mathrm{s}}}+1}\left( \alpha \right) \) are given in (7) and (9), respectively.

One reviewer has pointed out that the use of constant instead of higher order spline is not optimal, which we completely agree. Further research involving sophisticated nonstationary Gaussian process extreme value theory is needed to extend our present work to splines of any order, such as the popular cubic spline. To be precise, analog of Proposition 4 for higher order spline concerns the maximum of a standardized continuous Gaussian process over interval \([0,1]\), whereas for constant spline, the Gaussian process breaks down to \(N_{{\mathrm{s}}}+1\) weakly correlated standard Gaussian variables.

3 Asymptotic confidence regions

In this section, we construct the confidence regions for functions \(m_{l}(t)\), \(l=1,\ldots ,d\).

3.1 Asymptotic confidence intervals and SCCs

Theorems 1 and 2 allow one to construct pointwise confidence intervals and SCCs for components \(\hat{m} _{l}(t)\), \(l=1,\ldots ,d\). The next corollary provides the theoretical underpinning upon which SCCs can be actually implemented, see Sect. 3.2.

Corollary 1

Under Assumptions (A1)–(A6) in Appendix A, for any \(l=1,\ldots ,d \) and \(\alpha \in \left( 0,1\right) \), as \(n\rightarrow \infty \),
  1. (i)

    an asymptotic \(100\left( 1-\alpha \right) \%\) pointwise confidence interval for \(m_{l}(t)\), \(t\in [0,1]\), is \(\hat{m} _{l}(t)\pm \sigma _{n,ll}(t)Z_{1-\alpha /2}\), with \(\sigma _{n,ll}(t)\) given in (7), while \(Z_{1-\alpha /2}\) is the \(100\left( 1-\alpha /2\right) \)th percentile of the standard normal distribution.

     
  2. (ii)

    an asymptotic \(100\left( 1-\alpha \right) \%\) SCC for \(m_{l}(t)\), with \(Q_{N_{{\mathrm{s}}}+1}\left( \alpha \right) \) given in (9), is \(\hat{m}_{l}(t)\pm \sigma _{n,ll}(t)Q_{N_{{\mathrm{s}}}+1}\left( \alpha \right) \), \(t\in [0,1]\).

     

One reviewer has raised the interesting question whether our SCC would significantly improve by some form of bootstrapping. The answer is negative for now due to the lack of convincing procedures that simultaneously resample from the unknown distributions of both the unobserved error \(\varepsilon _{ij}\)s and the unobserved functional principal components \(\xi _{ik,l}\)s. On the other hand, further investigation in FDA will lead to theoretically sound resampling methods analogous to the smoothed bootstrap for nonparametric regression in Claeskens and Van Keilegom (2003).

3.2 Implementation

In the following, we describe procedures to construct the SCCs and the pointwise intervals given in Corollary 1. For any data set \(\left( T_{ij},Y_{ij},X_{il}\right) _{i=1,j=1,l=1}^{n,N_{i},d}\) from model (3), the spline estimators \(\hat{m}_{l}(t)\), \( l=1,\ldots ,d\), are obtained by (4), and the number of interior knots is taken to be \(N_{{\mathrm{s}}}=[cN_{{\mathrm{T}}}^{1/3}(\log (n))]\), in which \(N_{{\mathrm{T}}}=\sum _{i=1}^{n} N_{i}\) is the total sample size, \([a]\) denotes the integer part of \(a\), and \(c\) is a positive constant.

To construct the SCCs, one needs to evaluate the functions \(\sigma _{n,ll}^{2}(t)\), \(l=1,\ldots ,d\), which are the diagonal elements of matrix \( \mathbf {\Sigma }_{n}(t)\) in (7). Based on Proposition 1, one can estimate each unknowns \(f(t)\), \(\sigma _{Y}^{2}(t, \mathbf {x})\), \(G_{l}\left( t,t\right) \) and matrix \(\mathbf {H}\) and then plug these estimators into the formula of the SCCs; see Wang and Yang (2009).

The number of interior knots for pilot estimation of \(f(t)\), \(\sigma _{Y}^{2}(t,\mathbf {x})\), and \(G_{l}\left( t,t\right) \) is taken to be \(N_{ {\mathrm{s}}}^{*}=\left[ 0.5n^{1/3}\right] \), and \(h_{{\mathrm{s}}}^{*}=1/\left( 1+N_{{\mathrm{s}}}^{*}\right) \). The histogram estimator of the density function \(f(t)\) is \(\hat{f}(t)=N_{{\mathrm{T}}}^{-1}h_{{\mathrm{s}} }^{*-1}\sum _{i=1}^{n}\sum _{j=1}^{N_{i}}b_{J(t)}(T_{ij})\).

To estimate the covariance matrix \(\mathbf {\Gamma }_{n}\left( t\right) \) in (5), define the raw covariance term \(R_{ij}=\left( Y_{ij}-\sum _{l=1}^{d}\hat{m}(T_{ij})X_{il}\right) ^{2}\), \(1\le j\le N_{i}\), \(1\le i\le n\), the estimator of \(\sigma _{Y}^{2}(t,\mathbf {x})\) is
$$\begin{aligned} \hat{\sigma }_{Y}^{2}(t,\mathbf {x})=\sum _{l=1}^{d}\sum _{J=0}^{N_{{\mathrm{s}} }^{*}}\hat{\rho }_{J,l}b_{J}(t)x_{l}^{2}+\sum _{J=0}^{N_{{\mathrm{s}}}^{*}} \hat{\mu }_{J}b_{J}(t)=\sum _{l=1}^{d}\hat{G}_{l}\left( t,t\right) x_{l}^{2}+ \hat{\sigma }^{2}(t), \end{aligned}$$
where \(\{\hat{\rho }_{0,1},\ldots ,\hat{\rho }_{N_{{\mathrm{s}}}^{*},d},\hat{ \mu }_{0},\ldots ,\hat{\mu }_{N_{{\mathrm{s}}}^{*}}\}^{\scriptstyle {\mathsf {T}} }\) are solutions of the following least squares problem:
$$\begin{aligned}&\left( \hat{\rho }_{0,1},\ldots ,\hat{\rho }_{N_{{\mathrm{s}}}^{*},d},\hat{ \mu }_{0},\ldots ,\hat{\mu }_{N_{{\mathrm{s}}}^{*}}\right) ^{\scriptstyle { \mathsf {T}}} \\&\quad = \mathop {{\mathrm{argmin}}}_{\left( \rho _{0,1},\ldots ,\mu _{N_{{\mathrm{s}}}^{*}}\right) ^{\scriptstyle {\mathsf {T}}}\in \mathbb {R}^{(N_{{\mathrm{s}}}^{*}+1)(d+1)}}\sum _{i=1}^{n}\sum _{j=1}^{N_{i}} \left\{ R_{ij}-\sum _{l=1}^{d}\sum _{J=0}^{N_{{\mathrm{s}}}^{*}}\rho _{J,l}b_{J}(T_{ij})X_{il}^{2}-\sum _{J=0}^{N_{{\mathrm{s}}}^{*}}\mu _{J}b_{J}(T_{ij})\right\} ^{2}. \end{aligned}$$
The matrix \(\mathbf {\Gamma }_{n}\left( t\right) \) is estimated by substituting \(f(t)\), \(G_{l}\left( t,t\right) \) and \(\sigma _{Y}^{2}(t, \mathbf {x})\) with \(\hat{f}(t)\), \(\hat{G}_{l}\left( t,t\right) \) and \(\hat{ \sigma }_{Y}^{2}(t,\mathbf {x})\). Define
$$\begin{aligned} \hat{\mathbf {{\Gamma }}}_{n}\left( t\right)&\equiv \left[ n^{-1} \sum _{i=1}^{n}X_{il}X_{il^{\prime }}\hat{\sigma }_{Y}^{2}(t,\mathbf {X} _{i})\left\{ \hat{f}(t)h_{{\mathrm{s}}}N_{{\mathrm{T}}}\right\} ^{-1}\right. \\&\left. \quad \times \left\{ 1+\left( \frac{\sum _{i=1}^{n}N_{i}^{2}}{N_{{\mathrm{T }}}}-1\right) \frac{\sum _{l=1}^{d}\hat{G}_{l}\left( t,t\right) X_{il}^{2}}{ \hat{\sigma }_{Y}^{2}(t,\mathbf {X}_{i})}\hat{f}(t)h_{{\mathrm{s}}}\right\} \right] _{l,l^{\prime }=1}^{d}. \end{aligned}$$
The following proposition provides the consistent rate of \(\hat{\mathbf {{\Gamma }}}_{n}(t)\) to \(\mathbf {\Gamma }_{n}(t)\).

Proposition 2

Under Assumptions (A1)–(A6) in Appendix A, there exists a constant \(c>0\) such that as \(n\rightarrow \infty \), \(\left\| \hat{\mathbf {{\Gamma }}}_{n}(t)-\mathbf {\Gamma }_{n}(t)\right\| _{\infty }= \mathcal {O}_{p}\left( n^{-c}\right) \).

Proposition 2 implies that \(\mathbf {\Gamma }_{n}(t)\) can be replaced by \(\hat{\mathbf {{\Gamma }}}_{n}(t)\) with a negligible error. Define a \(d\times d\) matrix \(\hat{\mathbf {H}}=\left\{ n^{-1}\sum _{i=1}^{n}X_{il}X_{il^{\prime }}\right\} _{l,l^{\prime }=1}^{d}\), then \(\mathbf {\Sigma }_{n}(t)\) can be estimated well by \(\hat{\mathbf {\Sigma }}_{n}(t)=\left\{ \hat{\sigma }_{n,ll^{\prime }}^{2}(t)\right\} _{l,l^{\prime }=1}^{d}=\hat{\mathbf {H}}^{-1}\hat{\mathbf {{\Gamma }}}_{n}\left( t\right) \hat{\mathbf {H}}^{-1}\). Therefore, as \(n\rightarrow \infty \), \(l=1,\ldots ,d\), the SCCs
$$\begin{aligned} \hat{m}_{l}(t)\pm \hat{\sigma }_{n,ll}(t)Q_{N_{{\mathrm{s}}}+1}\left( \alpha \right) , \end{aligned}$$
(11)
with \(Q_{N_{{\mathrm{s}}}+1}\left( \alpha \right) \) given in (9), and the pointwise intervals \(\hat{m}_{l}(t)\pm \hat{\sigma } _{n,ll}(t)Z_{1-\alpha /2}\) have asymptotic confidence level \(1-\alpha \).

4 Decomposition

In this section, we describe the representation of the spline estimators \( \hat{m}_{l}(t)\), \(l=1,\ldots ,d\), in (4), then break the estimation error \(\hat{m}_{l}(t)-m_{l}(t)\) into three terms by the decomposition of \(Y_{ij}\) in model (3). Although such representation is not needed for applying the procedure described in Sect. 3.2 to analyze data, it provides insights into the proof of the main theoretical results in Sect. 2.

We consider the following rescaled B-spline basis \(\left\{ B_{J}(t)\right\} _{J=0}^{N_{{\mathrm{s}}}}\) for \(G^{\left( -1\right) }\):
$$\begin{aligned} B_{J}(t)\equiv b_{J}(t)\left( c_{J,n}\right) ^{-1/2},\quad J=0,\ldots ,N_{ {\mathrm{s}}}. \end{aligned}$$
(12)
It is easily verified that \({{\mathsf {E}}}\{B_{J}(T)\}^{2}=1\) for \( J=0,1,\ldots ,N_{{\mathrm{s}}}\), and \(B_{J}(t)B_{J^{\prime }}(t)\equiv 0\) for \( J\ne J^{\prime }\). By simple linear algebra, the spline estimator \(\hat{m} _{l}(t)\) defined in (4) equals
$$\begin{aligned} \hat{m}_{l}(t)=\sum _{J=0}^{N_{{\mathrm{s}}}}\hat{\gamma } _{J,l}B_{J}(t)=c_{J(t),n}^{-1/2}\hat{\gamma }_{J(t),l},\quad l=1,\ldots ,d, \end{aligned}$$
(13)
where the coefficients \(\hat{\mathbf {\gamma }}=\left( \hat{\mathbf {\gamma }} _{0}^{\scriptstyle {\mathsf {T}}},\ldots ,\hat{\mathbf {\gamma }}_{N_{{\mathrm{s}} }}^{\scriptstyle {\mathsf {T}}}\right) ^{\scriptstyle {\mathsf {T}}}\) with \(\hat{ \mathbf {\gamma }}_{J}=\left( \hat{\gamma }_{J,1},\ldots ,\hat{\gamma } _{J,d}\right) ^{\scriptstyle {\mathsf {T}}}\) being the solution of the following least squares problem
$$\begin{aligned} \hat{\mathbf {\gamma }}= \mathop {{\mathrm{argmin}}}_{\mathbf {\gamma }=\left( \gamma _{0,1},\ldots ,\gamma _{N_{{\mathrm{s}}},d}\right) ^{\scriptstyle {\mathsf {T}} }\in \mathbb {R}^{d\left( N_{{\mathrm{s}}}+1\right) }}\sum _{i=1}^{n}\sum _{j=1}^{N_{i}}\left\{ Y_{ij}-\sum _{l=1}^{d}\sum _{J=0}^{N_{{\mathrm{s}}}}\gamma _{J,l}B_{J}\left( T_{ij}\right) X_{il}\right\} ^{2}. \end{aligned}$$
(14)
In the following, let \(\mathbf {Y}=\left( Y_{11},\ldots ,Y_{1N_{1}},\ldots ,Y_{n1},\ldots ,Y_{nN_{n}}\right) ^{\scriptstyle {\mathsf {T}}}\) be the collection of all the \(Y_{ij}\)s. Let \(\mathbf {B}(t)=\left( B_{0}(t),\ldots ,B_{N_{{\mathrm{s}}}}(t)\right) ^{\scriptstyle {\mathsf {T}}}\) and \(\mathbf {X} _{i}=\left( X_{i1},\ldots ,X_{id}\right) ^{\scriptstyle {\mathsf {T}}}\) be two vectors of dimension \(\left( N_{{\mathrm{s}}}+1\right) \) and \(d\), respectively. Denote
$$\begin{aligned} \mathbf {D}=\left( \mathbf {B}(T_{11})\otimes \mathbf {X}_{1},\ldots ,\mathbf {B} (T_{1N_{1}})\otimes \mathbf {X}_{1},\ldots ,\mathbf {B}(T_{n1})\otimes \mathbf { X}_{n},\ldots ,\mathbf {B}(T_{nN_{n}})\otimes \mathbf {X}_{n}\right) ^{ \scriptstyle {\mathsf {T}}}, \end{aligned}$$
(15)
a \(N_{{\mathrm{T}}}\times \left( \left( N_{{\mathrm{s}}}+1\right) d\right) \) matrix, where “ \(\otimes \)” denotes the Kronecker product. Solving the least squares problem in (14), we obtain
$$\begin{aligned} \hat{\mathbf {\gamma }}=\left( \mathbf {D}^{\scriptstyle {\mathsf {T}}}\mathbf {D} \right) ^{-1}\left( \mathbf {D}^{\scriptstyle {\mathsf {T}}}\mathbf {Y}\right) . \end{aligned}$$
(16)
Denote \(\mathbf {x}=\left( x_{1},\ldots ,x_{d}\right) ^{\scriptstyle {\mathsf {T }}}\), thus Eq. (4) can be rewritten as
$$\begin{aligned} \sum _{l=1}^{d}\hat{m}_{l}(t)x_{l}=\left( \mathbf {B}(t)\otimes \mathbf {x} \right) ^{\scriptstyle {\mathsf {T}}}\left( \mathbf {D}^{\scriptstyle {\mathsf {T} }}\mathbf {D}\right) ^{-1}\left( \mathbf {D}^{\scriptstyle {\mathsf {T}}}\mathbf { Y}\right) . \end{aligned}$$
(17)
According to (15), one has \(\mathbf {D}^{\scriptstyle { \mathsf {T}}}\mathbf {D}=\sum _{i=1}^{n}\sum _{j=1}^{N_{i}}\left\{ \mathbf {B} (T_{ij})\mathbf {B}(T_{ij})^{\scriptstyle {\mathsf {T}}}\otimes \mathbf {X}_{i} \mathbf {X}_{i}^{\scriptstyle {\mathsf {T}}}\right\} \), in which matrix \( \mathbf {B}(T_{ij})\mathbf {B}(T_{ij})^{\scriptstyle {\mathsf {T}}}={\mathrm{diag}} \left\{ B_{0}^{2}(T_{ij}),\ldots ,B_{N_{{\mathrm{s}}}}^{2}(T_{ij})\right\} \). So matrix \(\mathbf {D}^{\scriptstyle {\mathsf {T}}}\mathbf {D}\) should be a block diagonal matrix, and \(N_{{\mathrm{T}}}^{-1}\mathbf {D}^{\scriptstyle {\mathsf {T}}} \mathbf {D}={\mathrm{diag}}\hat{\mathbf {{V}}}_{0},\ldots ,{\hat{\mathbf {V}}}_{N_{ {\mathrm{s}}}}\}\), where
$$\begin{aligned} {\hat{\mathbf {V}}}_{J}=\left\{ N_{{\mathrm{T}}}^{-1}\sum _{i=1}^{n} \sum _{j=1}^{N_{i}}B_{J}^{2}(T_{ij})X_{il}X_{il^{\prime }}\right\} _{l,l^{\prime }=1}^{d}. \end{aligned}$$
(18)
On the other hand, we have \(\mathbf {D}^{\scriptstyle {\mathsf {T}}}\mathbf {Y} =\sum _{i=1}^{n}\sum _{j=1}^{N_{i}}\left\{ \mathbf {B}(T_{ij})\otimes \mathbf {X} _{i}\right\} Y_{ij}\). Thus, \(\hat{\mathbf {\gamma }}=\left( \hat{\mathbf { \gamma }}_{0}^{\scriptstyle {\mathsf {T}}},\ldots ,\hat{\mathbf {\gamma }}_{N_{ {\mathrm{s}}}}^{\scriptstyle {\mathsf {T}}}\right) ^{\scriptstyle {\mathsf {T}}}\) can be easily calculated using
$$\begin{aligned} \hat{\mathbf {\gamma }}_{J}=\hat{\mathbf {V}}_{J}^{-1}\left\{ N_{{\mathrm{T}} }^{-1}\sum _{i=1}^{n}\sum _{j=1}^{N_{i}}B_{J}(T_{ij})X_{il}Y_{ij}\right\} _{l=1}^{d},\quad J=0,\ldots ,N_{{\mathrm{s}}}. \end{aligned}$$
(19)
Then the functions \(\mathbf {m}(t)=\left( {m}_{1}(t),\ldots ,{m} _{d}(t)\right) ^{\scriptstyle {\mathsf {T}}}\) can be simply estimated by
$$\begin{aligned} \hat{\mathbf {m}}(t)=\left( \hat{m}_{1}(t),\ldots ,\hat{m}_{d}(t)\right) ^{ \scriptstyle {\mathsf {T}}}=c_{J(t),n}^{-1/2}\left( \hat{\gamma } _{J(t),1},\ldots ,\hat{\gamma }_{J(t),d}\right) ^{\scriptstyle {\mathsf {T}} }=c_{J(t),n}^{-1/2}\hat{\mathbf {\gamma }}_{J(t)}. \end{aligned}$$
(20)
Projecting the relationship in model (3) onto the space of spline coefficient functions on \(\mathcal {T\times }\mathbb {R}^{d}\) as \(\mathcal {M}\), we obtain the following important decomposition:
$$\begin{aligned} \sum _{l=1}^{d}\hat{m}_{l}(t)x_{l}=\sum _{l=1}^{d}\tilde{m}_{l}(t)x_{l}+ \sum _{l=1}^{d}\tilde{\xi }_{l}(t)x_{l}+\sum \limits _{l=1}^{d}\tilde{\varepsilon }_{l}(t)x_{l}, \end{aligned}$$
(21)
where for any \(l=1,\ldots ,d,\)
$$\begin{aligned}&\displaystyle \tilde{m}_{l}(t)=\sum _{J=0}^{N_{{\mathrm{s}}}}\tilde{\gamma } _{J,l}B_{J}(t)=c_{J(t),n}^{-1/2}\tilde{\gamma }_{J(t),l}, \end{aligned}$$
(22)
$$\begin{aligned}&\displaystyle \tilde{\xi }_{l}(t)=\sum _{J=0}^{N_{{\mathrm{s}}}}\tilde{\alpha } _{J,l}B_{J}(t)=c_{J(t),n}^{-1/2}\tilde{\alpha }_{J(t),l},\quad \tilde{\varepsilon } _{l}(t)=\sum _{J=0}^{N_{{\mathrm{s}}}}\tilde{\theta } _{J,l}B_{J}(t)=c_{J(t),n}^{-1/2}\tilde{\theta }_{J(t),l},\nonumber \\ \end{aligned}$$
(23)
where \((\tilde{\gamma }_{J,l},J=0,\ldots ,N_{{\mathrm{s}}},l=1,\ldots ,d)^{ \scriptstyle {\mathsf {T}}}\), \(\left( \tilde{\alpha }_{J,l},J=0,\ldots ,N_{ {\mathrm{s}}},l=1,\ldots ,d\right) ^{\scriptstyle {\mathsf {T}}}\), and \((\tilde{ \theta }_{J,l},J=0,\ldots ,N_{{\mathrm{s}}},l=1,\ldots ,d)^{\scriptstyle {\mathsf {T }}}\) are solutions to (14) with \(Y_{ij}\) replaced by \( \sum _{l=1}^{d}m_{l}\left( T_{ij}\right) X_{il}\), \(\sum _{l=1}^{d}\sum _{k=1}^{ \infty }\xi _{ik,l}\phi _{k,l}\left( T_{ij}\right) X_{il}\), and \(\sigma \left( T_{ij}\right) \varepsilon _{ij}\), respectively.
Furthermore, under Assumption (A5) we can decompose \(\hat{m}_{l}(t)\) as
$$\begin{aligned} \hat{m}_{l}(t)=\tilde{m}_{l}(t)+\tilde{\xi }_{l}(t)+\tilde{\varepsilon } _{l}(t),\qquad l=1,\ldots ,d. \end{aligned}$$
(24)
The next two propositions concern the functions \(\tilde{m}_{l}(t)\), \(\tilde{ \xi }_{l}(t),\) \(\tilde{\varepsilon }_{l}(t)\), \(l=1,\ldots ,d\), given in (22) and (23). Proposition 3 gives the uniform convergence rate of \(\tilde{m} _{l}(t)\) to \(m_{l}(t)\). Proposition 4 provides the asymptotic distribution for the maximum of the normalized error terms.

Proposition 3

Under Assumptions (A1), (A2) and (A4)–(A6) in Appendix A, the functions \(\tilde{m}_{l}(t)\), \(l=1,\ldots ,d\) satisfy \( \sup _{t\in \left[ 0,1\right] }\sup _{1\le l\le d}\left| \tilde{m} _{l}(t)-m_{l}(t)\right| =\mathcal {O}_{p}(h_{{\mathrm{s}}}).\)

Proposition 4

Under Assumptions (A2)–(A6) in Appendix A, for \(\tau \in \mathbb {R}\), and \(\sigma _{n,ll}(t)\), \(a_{N_{{\mathrm{s}}}+1}\), \(b_{N_{ {\mathrm{s}}}+1}\) as given in (7) and (9),
$$\begin{aligned} \lim \limits _{n\rightarrow \infty }P\left\{ \sup _{t\in \left[ 0,1\right] }\sigma _{n,ll}^{-1}(t)\left| \tilde{\xi }_{l}(t)+\tilde{\varepsilon } _{l}(t)\right| \le \tau /a_{N_{{\mathrm{s}}}+1}+b_{N_{{\mathrm{s}}}+1}\right\} =\exp \left( -2\mathrm{e}^{-\tau }\right) . \end{aligned}$$

5 Simulation

To illustrate the finite-sample performance of the spline approach, we generate data from the following model
$$\begin{aligned} Y_{ij}&= \left\{ m_{1}\left( T_{ij}\right) +\sum _{k=1}^{2}\xi _{ik,1}\phi _{k,1}\left( T_{ij}\right) \right\} X_{i1}+\left\{ m_{2}\left( T_{ij}\right) +\sum _{k=1}^{3}\xi _{ik,2}\phi _{k,2}\left( T_{ij}\right) \right\} X_{i2} \\&+\sigma \left( T_{ij}\right) \varepsilon _{ij},\quad 1\le i\le n,1\le j\le N_{i}, \end{aligned}$$
where \(T\sim U[0,1]\), \(X_{1}\sim N(0,1)\), \(X_{2}\sim \hbox {Binomial}[1,0.5]\), \(\xi _{k,1}\sim N(0,1)\), \(k=1,2\), \(\xi _{k,2}\sim N(0,1)\), \(k=1,2,3,\) \(\varepsilon \sim N(0,1)\), and \(N_{i}\) is generated from a discrete uniform distribution from \(2,\ldots ,14\), for \(1\le i\le n\). For the first component, we take \(m_{1}(t)=\sin \left\{ 2\pi \left( t-1/2\right) \right\} \), \(\phi _{1,1}(t)=-2\cos \left\{ \pi \left( t-1/2\right) \right\} /\sqrt{5}\), \(\phi _{2,1}(t)=\sin \left\{ \pi \left( t-1/2\right) \right\} /\sqrt{5}\), thus \(\lambda _{1,1}=2/5\), \(\lambda _{2,1}=1/10\). For the second component, we take \(m_{2}(t)=5\left( t-0.6\right) ^{2}\), \(\phi _{1,2}(t)=1\), \(\phi _{2,2}(t)=\sqrt{2}\sin \left( 2\pi t\right) \), \(\phi _{3,2}(t)=\sqrt{2}\cos \left( 2\pi t\right) \), thus \(\lambda _{1,2}=\lambda _{2,2}=\lambda _{3,2}=1\). The noise level is chosen to be \(\sigma =0.5,1.0\), and the number of subjects \(n \) is taken to be \(200,400,600,800\).
We consider the confidence levels \(1-\alpha =0.95\) and \(0.99\). Table 1 reports the coverage of the SCCs as the percentage out of the total \(500\) replications for which the true curve was covered by (11) at the \(101\) points \(\{k/100,k=0,\ldots ,100\}\).
Table 1

Coverage percentages of the SCCs for functions \(m_{1}\) (left) and \(m_{2}\) (right), based on \(500\) replications

\(\sigma \)

\(n\)

\(1-\alpha \)

\(c=0.3\)

\(c=0.5\)

\(c=0.8\)

\(c=1\)

1.0

200

\(0.950\)

\(0.950,0.952\)

\( 0.944,0.948\)

\(0.920,0.904\)

\(0.886,0.884\)

  

\(0.990\)

\(0.990,0.998\)

\(0.990,0.990\)

\(0.976,0.984\)

\(0.968,0.974\)

 

400

\(0.950\)

\(0.944,0.948\)

\(0.950,0.930\)

\( 0.922,0.912\)

\(0.908,0.904\)

  

\(0.990\)

\(0.996,0.984\)

\(0.990,0.988\)

\(0.984,0.988\)

\(0.974,0.966\)

 

600

\(0.950\)

\(0.934,0.962\)

\(0.954,0.946\)

\( 0.930,0.952\)

\(0.930,0.924\)

  

\(0.990\)

\(0.992,0.996\)

\(0.992,0.986\)

\(0.988,0.990\)

\(0.984,0.990\)

 

800

\(0.950\)

\(0.936,0.934\)

\(0.960,0.966\)

\(0.950,0.964\)

\(0.956,0.934\)

  

\(0.990\)

\(0.998,0.996\)

\(0.994,0.994\)

\(0.986,0.992\)

\(0.988,0.988\)

0.5

200

\(0.950\)

\(0.936,0.948\)

\(0.952,0.942\)

\(0.916,0.900\)

\(0.912,0.890\)

  

\(0.990\)

\(0.988,0.994\)

\(0.992,0.990\)

\(0.972,0.974\)

\(0.972,0.972\)

 

400

\(0.950\)

\(0.916,0.930\)

\(0.936,0.932\)

\(0.928,0.916\)

\(0.904,0.898\)

  

\(0.990\)

\(0.994,0.984\)

\(0.992,0.988\)

\(0.996,0.988\)

\(0.978,0.976\)

 

600

\(0.950\)

\(0.924,0.948\)

\(0.952,0.954\)

\(0.926,0.958\)

\(0.936,0.938 \)

  

\(0.990\)

\(0.996,0.994\)

\(0.994,0.986\)

\(0.984,0.990\)

\(0.990,0.994\)

 

800

\(0.950\)

\(0.942,0.900\)

\(0.950,0.960\)

\(0.942,0.962\)

\(0.960,0.938\)

  

\(0.990\)

\(0.996,0.998\)

\(0.996,0.994\)

\(0.990,0.996\)

\(0.992,0.988\)

In the above SCC construction, the number of interior knots \(N_{{\mathrm{s}}}\) is determined by the sample size \(n\) and a tuning constant \(c\) as described in Sect. 3.2. We have experimented with \(c=0.3,0.5,0.8,1.0 \) in this simulation study. The simulation results in Table 1 reflect that the coverage percentages depend on the choice of \(c\); however, the dependency becomes weaker when sample sizes increase. For large sample sizes \(n=600,800\), the effect of the choice of \(c\) on the coverage percentages is insignificant. Because \(N_{{\mathrm{s}}}\) varies with \(N_{i}\), for \(1\le i\le n\), the data-driven selection of an “optimal” \(N_{{\mathrm{s}}}\) remains an open problem. At all noise levels, the coverage percentages for the SCC (11) are very close to the nominal confidence levels \(0.95\) and \(0.99\) for \(c=0.5\). Note that since \({{\mathsf {E}}}N_{1}=8\), the total sample size \(N_{{\mathrm{T}} }\approx 8\times 200,8\times 400,8\times 600,8\times 800\) which explains the closeness of coverage percentages in Table 1 to the nominal levels. These large \(N_{{\mathrm{T}}}\)s are realistic as we believe they are common for real data. For instance, the CD4 cell percentage data in Sect. 6 has \(N_{{\mathrm{T}}}=1{,}817\).

For visualization of actual function estimates, Fig. 1 shows the true curve, the estimated curve, the asymptotic 95 % SCC and the pointwise confidence intervals at \(\sigma =0.5\) with \(n=200\). The same plot for \(n=600\) has shown significantly narrower SCC and pointwise confidence intervals as expected, but is not included to save space.
Fig. 1

Plots of 95 % SCC (11) (upper and lower solid), pointwise confidence intervals (dashed), the spline estimator (thin), and the true function (middle thick) at \(\sigma =0.5\), \( n=200 \) for \(m_{1}\) (left) and \(m_{2}\) (right)

6 Real data analysis

To illustrate our method, we return to the CD4 cell percentage data discussed in Sect. 1 for further analysis. Since the actual visit times \(T_{ij}\) are irregularly spaced and vary from year 0 to year 6, we first transform the times by \(Z_{ij}=F_{N_{T}}\left( T_{ij}\right) \), where \(F_{N_{T}}\) is the empirical cdf of times \(\left\{ T_{ij}\right\} _{i=1,j=1}^{n,N_{i}}\). Then the \(Z_{ij}\) values are distributed fairly uniformly. We have set a slightly smaller number of interior knots \(N_{{\mathrm{ s}}}=[0.3N_{{\mathrm{T}}}^{1/3}(\log (n))]\) to avoid singularity in solving the least squares problem.

The left plots of Figs. 2, 3, 4 and 5 depict the spline estimates, the asymptotic \(95\,\%\) SCCs, the pointwise confidence intervals for \(m_{l}\left( t\right) \), \( l=0,1,2,3\), respectively. The horizontal solid line represents zero. Based on the shape of the SCCs, we are interested in testing the following hypotheses:

\(H_{00}:m_{0}\left( t\right) \equiv a+bt\), for some \(a,b\in \mathbb {R}\) v.s. \(H_{10}:m_{0}\left( t\right) \ne a+bt\), for any \(a,b\in \mathbb {R}\);

\(H_{01}:m_{1}\left( t\right) \equiv 0\) v.s. \(H_{11}:m_{1}\left( t\right) \ne 0,\) for some \(t\in [0,6]\);

\(H_{02}:m_{2}\left( t\right) \equiv c\) for some \(c>0\) v.s. \( H_{12}:m_{2}\left( t\right) \ne c\), for any \(c>0\);

\(H_{03}:m_{3}\left( t\right) \equiv 0\) v.s. \(H_{13}:m_{3}\left( t\right) \ne 0,\) for some \(t\in [0,6]\).
Fig. 2

Plots of a \(95\,\%\) SCC (upper and lower solid), pointwise confidence intervals (dashed) and the spline estimator \(\hat{m}_{0}\) (middle solid) for baseline effect; and b the same except with confidence level \(1- \hat{\alpha }_{0}\) and the estimated \(m_{0}\) under \(H_{00}\) (solid linear)

Fig. 3

Plots of a \(95\,\%\) SCC (upper and lower solid), pointwise confidence intervals (dashed) and the spline estimator \(\hat{m}_{1}\) (middle solid) for smoking effect; and b the same except with confidence level \(1- \hat{\alpha }_{1}\) and the estimated \(m_{1}\) under \(H_{01}\) (solid linear)

Asymptotic \(p\) values are calculated for each pair of hypotheses as \(\hat{ \alpha }_{0}=0.99072\), \(\hat{\alpha }_{1}=0.79723\), \(\hat{\alpha }_{2}=0.25404\), \(\hat{\alpha }_{3}=0.10775\). Apparently, none of the null hypothesis is rejected. The \(p\) values are calculated as, for example
$$\begin{aligned} \hat{\alpha }_{0}=1-\exp \left[ -2\exp \left( -a_{N_{{\mathrm{s}}}+1}\left\{ \max _{k=0}^{400}\left| \frac{\hat{m}_{0}\left( t_{k}\right) -\left( \hat{ a}+\hat{b}t_{k}\right) }{\hat{\sigma }_{n,ll}(t_{k})}\right| -b_{N_{{\mathrm{ s}}}+1}\right\} \right) \right] , \end{aligned}$$
where \(t_{k},k=0,\ldots ,400\) are equally spaced grid points over the range of the actual visit times, while \(\hat{a}+\hat{b}t\) is a least squares linear approximation to \(\hat{m}_{0}\left( t\right) \). In other words, the \(p\) value \(\hat{\alpha }_{0}\) is a solution of
$$\begin{aligned} \max _{k=0}^{400}\left| \frac{\hat{m}_{0}\left( t_{k}\right) -\left( \hat{ a}+\hat{b}t_{k}\right) }{\hat{\sigma }_{n,ll}(t_{k})}\right| =b_{N_{{\mathrm{ s}}}+1}-a_{N_{{\mathrm{s}}}+1}^{-1}\log \left\{ -\frac{1}{2} \log (1-\hat{\alpha }_{0})\right\} . \end{aligned}$$
Fig. 4

Plots of a \(95\,\%\) SCC (upper and lower solid), pointwise confidence intervals (dashed) and the spline estimator \(\hat{m}_{2}\) (middle solid) for pre-infection CD4 effect; and b the same except with confidence level \(1-\hat{\alpha }_{2}\) and the estimated \(m_{2}\) under \(H_{02}\) (solid linear)

Fig. 5

Plots of a \(95\,\%\) SCC (upper and lower solid), pointwise confidence intervals (dashed) and the spline estimator \(\hat{m}_{3}\) (middle solid) for age effect; and b the same except with confidence level \(1-\hat{ \alpha }_{3}\) and the estimated \(m_{3}\) under \(H_{03}\) (solid linear)

The right plots of Figs. 2, 3, 4 and 5 show the spline estimates, the \(100(1-\hat{\alpha } _{l})\%\) SCCs and the pointwise confidence intervals, and estimates of \(m_{l}\left( t\right) \) under \(H_{0l}\), \(l=0,1,2,3\). From these figures, one can see the baseline CD4 percentage of the population is a decreasing linear function of time and greater than zero over the range of time. The effects of smoking status and age at HIV infection are insignificant, while the pre-infection CD4 percentage is positively proportional to the post-infection CD4 percentage. These findings are consistent with the observations in Wu and Chiang (2000), Fan and Zhang (2000) and Wang et al. (2008), but are put on rigorous standing due to the quantification of type I errors by computing asymptotic \(p\) values relative to the SCCs.

Notes

Acknowledgments

This work is part of Lijie Gu’s dissertation and has been supported in part by the Deutsche Forschungsgemeinschaft through the CRC 649 “Economic Risk”, the US National Science Foundation awards DMS 0905730, 1007594, 1106816, 1309800, Jiangsu Specially Appointed Professor Program SR10700111, Jiangsu Province Key-Discipline Program (Statistics) ZY107002, National Natural Science Foundation of China award 11371272, and Research Fund for the Doctoral Program of Higher Education of China award 20133201110002.

References

  1. Bosq D (1998) Nonparametric statistics for stochastic processes. Springer, New YorkCrossRefzbMATHGoogle Scholar
  2. Brumback B, Rice JA (1998) Smoothing spline models for the analysis of nested and crossed samples of curves (with Discussion). J Am Stat Assoc 93:961–994CrossRefzbMATHMathSciNetGoogle Scholar
  3. Cao G, Yang L, Todem D (2012) Simultaneous inference for the mean function based on dense functional data. J Nonparametr Stat 24:359–377CrossRefzbMATHMathSciNetGoogle Scholar
  4. Cao G, Wang J, Wang L, Todem D (2012) Spline confidence bands for functional derivatives. J Stat Plan Inference 142:1557–1570CrossRefzbMATHMathSciNetGoogle Scholar
  5. Chiang CT, Rice JA, Wu CO (2001) Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. J Am Stat Assoc 96:605–619CrossRefzbMATHMathSciNetGoogle Scholar
  6. Claeskens G, Van Keilegom I (2003) Bootstrap confidence bands for regression curves and their derivatives. Ann Stat 31:1852–1884CrossRefzbMATHGoogle Scholar
  7. de Boor C (2001) A practical guide to splines. Springer, New YorkzbMATHGoogle Scholar
  8. Fan J, Zhang JT (2000) Functional linear models for longitudinal data. J R Stat Soc Ser B 62:303–322CrossRefMathSciNetGoogle Scholar
  9. Fan J, Zhang WY (2000) Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scand J Stat 27:715–731CrossRefzbMATHMathSciNetGoogle Scholar
  10. Fan J, Zhang WY (2008) Statistical methods with varying coefficient models. Stat Interface 1:179–195CrossRefMathSciNetGoogle Scholar
  11. Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New YorkGoogle Scholar
  12. Gabrys R, Horváth L, Kokoszka P (2010) Tests for error correlation in the functional linear model. J Am Stat Assoc 105:1113–1125CrossRefGoogle Scholar
  13. Hall P, Müller HG, Wang JL (2006) Properties of principal component methods for functional and longitudinal data analysis. Ann Stat 34:1493–1517CrossRefzbMATHGoogle Scholar
  14. Hall P, Titterington DM (1988) On confidence bands in nonparametric density estimation and regression. J Mult Anal 27:228–254CrossRefzbMATHMathSciNetGoogle Scholar
  15. Härdle W, Luckhaus S (1984) Uniform consistency of a class of regression function estimators. Ann Stat 12:612–623CrossRefzbMATHGoogle Scholar
  16. Hastie T, Tibshirani R (1993) Varying-coefficient models. J R Stat Soc Ser B 55:757–796zbMATHMathSciNetGoogle Scholar
  17. Hoover DR, Rice JA, Wu CO, Yang LP (1998) Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika 85:809–822CrossRefzbMATHMathSciNetGoogle Scholar
  18. Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, New YorkCrossRefzbMATHGoogle Scholar
  19. Huang JZ, Wu CO, Zhou L (2002) Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika 89:111–128CrossRefzbMATHMathSciNetGoogle Scholar
  20. Huang JZ, Wu CO, Zhou L (2004) Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Stat Sin 14:763–788zbMATHMathSciNetGoogle Scholar
  21. James GM, Hastie T, Sugar C (2000) Principal component models for sparse functional data. Biometrika 87:587–602CrossRefzbMATHMathSciNetGoogle Scholar
  22. James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98:397–408CrossRefzbMATHMathSciNetGoogle Scholar
  23. Leadbetter MR, Lindgren G, Rootzén H (1983) Extremes and related properties of random sequences and processes. Springer, New YorkCrossRefzbMATHGoogle Scholar
  24. Liu R, Yang L (2010) Spline-backfitted kernel smoothing of additive coefficient model. Econ Theory 26:29–59CrossRefzbMATHGoogle Scholar
  25. Ma S, Yang L, Carroll RJ (2012) A simultaneous confidence band for sparse longitudinal regression. Stat Sin 22:95–122zbMATHMathSciNetGoogle Scholar
  26. Manteiga W, Vieu P (2007) Statistics for functional data. Comput Stat Data Anal 51:4788–4792CrossRefzbMATHMathSciNetGoogle Scholar
  27. Ramsay JO, Silverman BW (2005) Functional data analysis. Springer, New YorkGoogle Scholar
  28. Wang L, Li H, Huang JZ (2008) Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J Am Stat Assoc 103:1556–1569CrossRefzbMATHMathSciNetGoogle Scholar
  29. Wang L, Yang L (2009) Polynomial spline confidence bands for regression curves. Stat Sin 19:325–342zbMATHGoogle Scholar
  30. Wu CO, Chiang CT (2000) Kernel smoothing on varying coefficient models with longitudinal dependent variable. Stat Sin 10:433–456zbMATHMathSciNetGoogle Scholar
  31. Wu CO, Chiang CT, Hoover DR (1998) Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data. J Am Stat Assoc 93:1388–1402CrossRefzbMATHMathSciNetGoogle Scholar
  32. Wu Y, Fan J, Müller HG (2010) Varying-coefficient functional linear regression. Bernoulli 16:730–758CrossRefzbMATHMathSciNetGoogle Scholar
  33. Xue L, Yang L (2006) Additive coefficient modelling via polynomial spline. Stat Sin 16:1423–1446MathSciNetGoogle Scholar
  34. Xue L, Zhu L (2007) Empirical likelihood for a varying coefficient model with longitudinal data. J Am Stat Assoc 102:642–654CrossRefzbMATHMathSciNetGoogle Scholar
  35. Yao W, Li R (2013) New local estimation procedure for a non-parametric regression function for longitudinal data. J R Stat Soc Ser B 75:123–138CrossRefMathSciNetGoogle Scholar
  36. Yao F, Müller HG, Wang JL (2005a) Functional linear regression analysis for longitudinal data. Ann Stat 33:2873–2903CrossRefzbMATHGoogle Scholar
  37. Yao F, Müller HG, Wang JL (2005b) Functional data analysis for sparse longitudinal data. J Am Stat Assoc 100:577–590CrossRefzbMATHGoogle Scholar
  38. Zhou L, Huang J, Carroll RJ (2008) Joint modelling of paired sparse functional data using principal components. Biometrika 95:601–619CrossRefzbMATHMathSciNetGoogle Scholar
  39. Zhu H, Li R, Kong L (2012) Multivariate varying coefficient model for functional responses. Ann Stat 40:2634–2666CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Sociedad de Estadística e Investigación Operativa 2014

Authors and Affiliations

  • Lijie Gu
    • 1
  • Li Wang
    • 2
  • Wolfgang K. Härdle
    • 3
    • 4
  • Lijian Yang
    • 1
  1. 1.Center for Advanced Statistics and Econometrics ResearchSoochow UniversitySuzhouChina
  2. 2.Department of StatisticsIowa State UniversityAmesUSA
  3. 3.Center for Applied Statistics and Economics (C.A.S.E.)Humboldt-Universität zu BerlinBerlinGermany
  4. 4.Lee Kong Chian School of BusinessSingapore Management UniversitySingaporeSingapore

Personalised recommendations