Skip to main content

On linear regression models in infinite dimensional spaces with scalar response

Abstract

In functional linear regression, the parameters estimation involves solving a non necessarily well-posed problem, which has points of contact with a range of methodologies, including statistical smoothing, deconvolution and projection on finite-dimensional subspaces. We discuss the standard approach based explicitly on functional principal components analysis, nevertheless the choice of the number of basis components remains something subjective and not always properly discussed and justified. In this work we discuss inferential properties of least square estimation in this context, with different choices of projection subspaces, as well as we study asymptotic behaviour increasing the dimension of subspaces.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  • Bache K, Lichman M (2013) UCI machine learning repository. https://archive.ics.uci.edu/ml/index.html. Accessed 27 Aug 2015

  • Cardot H, Ferraty F, Sarda P (2003) Spline estimators for the functional linear model. Stat Sin 13:571–591

    MathSciNet  MATH  Google Scholar 

  • Chiou JM, Müller HG, Wang JL, Carey JR (2003) A functional multiplicative effects model for longitudinal data, with application to reproductive histories of female medflies. Stat Sin 13:1119–1133

    MathSciNet  MATH  Google Scholar 

  • Cuevas A, Febrero M, Fraiman R (2002) Linear functional regression: the case of fixed design and functional response. Can J Stat 30(2):285–300

    MathSciNet  Article  MATH  Google Scholar 

  • Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148

    Article  MATH  Google Scholar 

  • Hastie T, Mallows C (1993) A discussion of A statistical view of some chemometrics regression tools by I. E. Frank and J. H. Friedman. Technometrics 35:140–143

    Google Scholar 

  • Hawkins T (1977) Weierstrass and the theory of matrices. Arch Hist Exact Sci 17(2):119–163

    MathSciNet  Article  MATH  Google Scholar 

  • Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, New York

    Book  MATH  Google Scholar 

  • Jacques J, Preda C (2014) Model-based clustering for multivariate functional data. Comput Stat Data Anal 71:92–106

    MathSciNet  Article  Google Scholar 

  • Koch I, Hoffman P, Marron JS (2013) Proteomics profiles from mass spectrometry. Electron J Stat 8(2):1703–1713

    MathSciNet  Article  MATH  Google Scholar 

  • Larsen F, van den Berg F, Engelsenm S (2006) An exploratory chemometric study of NMR spectra of table wines. J Chemom 20(5):198–208

    Article  Google Scholar 

  • Marx BD, Eilers PH (1996) Generalized linear regression on sampled signals with penalized likelihood. In: Forcina A, Marchetti GM, Hatzinger R, Galmacci G (eds) Statistical modelling. Proceedings of the 11th international workshop on statistical modelling, Orvietto

  • Melas V, Pepelyshev A, Shpilev P, Salmaso L, Corain L, Arboretti R (2014) On the optimal choice of the number of empirical Fourier coefficients for comparison of regression curves. Stat Pap. doi:10.1007/s00362-014-0619-1

  • Osborne BG, Fearn T, Miller AR, Douglas S (1984) Application of near infrared reflectance spectroscopy to the compositional analysis of biscuits and biscuit dough. J Sci Food Agric 35:99–105

    Article  Google Scholar 

  • R Development Core Team (2009) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org. Accessed 27 Aug 2015

  • Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Wang G, Zhou J, Wu W, Chen M (2015) Robust functional sliced inverse regression. Stat Pap. doi: 10.1007/s00362-015-0695-x

Download references

Acknowledgments

The authors wish to thank Piercesare Secchi for stimulating and essential discussions about topics covered by this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Maria Paganoni.

Appendices

Appendix 1: Formal characterization of the sub-space E

This section focuses on computing explicitly the following quantities introduced in the Sect. 3:

  1. (1)

    the orthonormal basis of \(E{\text {:}}\,\{\varphi _{k}^{E};\,k=1,\ldots ,d\};\)

  2. (2)

    the multivariate projection matrix \(P{\text {:}}\,\mathbb {R}^{d}\rightarrow \mathbb {R}^{d}\) that transforms the basis coefficients of elements in D in the basis coefficients of elements in E

  3. (3)

    the functional projection operator \(\pi {\text {:}}\,D\rightarrow E\subseteq S\) of D on S.

Let us consider point (1). First, project the basis of \(D\, (\{\varphi _{k}^{D};\,k=1,\ldots ,d\}\)) on S,  so obtaining a \(\dim (S)\times d\)-matrix A,  where \([A]_{ij}=\langle \varphi _{i}^{S},\,\varphi _{j}^{D}\rangle .\) Note that A may have infinite rows if \(\dim (S)=\infty .\) Then, the basis of D projected on S generates d linear independent functions given by \(A^{T}\varvec{\varphi ^{S}(t)},\) that is a basis for E. It is easy to show that \(A^{T}\varvec{\varphi ^{S}(t)}\) are linear independent since \(\varphi _{1}^{D},\ldots ,\varphi _{d}^{D}\) are, and \(D\cap S^{\perp }=0.\) To make \(A^{T}\varvec{\varphi ^{S}(t)}\) be an orthonormal basis for E we do some calculations, obtaining:

$$\begin{aligned} \varvec{\varphi ^{E}(t)}=V_{S}D_{D}^{-1/2}V_{D}^{T}A^{T}\varvec{\varphi ^{S}(t)}, \end{aligned}$$
(26)

where \(D_{D}\) and \(V_{D}\) represent the eigen-structure of \(A^{T}A\,(A^{T}AV_{D}=V_{D}D_{D}\)) and \(V_{S}\) is an arbitrary \(d\times d\)-orthonormal matrix that allows the basis of E to be changed; without loss of generality, we can consider \(V_{S}=I_{d}.\) Note that, except for \(V_{S},\) the basis \(\varvec{\varphi ^{E}(t)}\) is independent of the choice of the basis \(\varvec{\varphi ^{D}(t)}\) and \(\varvec{\varphi ^{S}(t)}.\) It is worth saying that the eigenvalues in \(D_{D}\) are all strictly positive since \(A^{T}A\) has full rank, since \(\varphi _{1}^{E},\ldots ,\varphi _{d}^{E}\) are linear independent. Moreover, the eigenvalues in \(D_{D}\) are all less or equal to one since A is a projection operator.

Now, consider point (2). From (26) the projection matrix P from D to E can be defined as

$$\begin{aligned} P := \left\langle \varvec{\varphi ^{E}(t)},\,\left( \varvec{\varphi ^{D}(t)}\right) ^{T}\right\rangle =V_{S}D_{D}^{-1/2}V_{D}^{T}A^{T}\left\langle \varvec{\varphi ^{S}(t)},\,\left( \varvec{\varphi ^{D}(t)}\right) ^{T}\right\rangle , \end{aligned}$$

since \(\langle \varvec{\varphi ^{S}(t)},\,(\varvec{\varphi ^{D}(t)})^{T}\rangle =A\) and \(V_{D}^{T}A^{T}A=D_{D}V_{D}^{T},\) we obtain

$$\begin{aligned} P=V_{S}D_{D}^{1/2}V_{D}^{T}. \end{aligned}$$
(27)

Note that, using (27) we can rewrite (26) as

$$\begin{aligned} \varvec{\varphi ^{E}(t)}=\left( P^{-1}\right) ^{T}A^{T}\varvec{\varphi ^{S}(t)}. \end{aligned}$$

Then, from the vectorial estimate in E given by (9), we can obtain the vectorial estimate in D with \(\varvec{\widehat{\beta }^{D}_{n}}=P^{-1}(\varvec{\widehat{\beta }^{E}_{n}}),\) and finally compute the functional estimate \(\widehat{\beta }^{D}_{n}(t)=(\varvec{\widehat{\beta }^{D}_{n}})^{T}\varvec{\varphi ^{D}(t)}.\) This coincides with the solution of (7).

Finally, consider point (3). Using the projection matrix P we can define the functional operator \(\pi \) as follows

$$\begin{aligned} \pi (g)=\left( P\left\langle g,\,\varvec{\varphi ^{D}(t)}\right\rangle \right) ^{T}\varvec{\varphi ^{E}(t)}, \end{aligned}$$

for any \(g\in D.\) Then, using (27) we can easily obtain

$$\begin{aligned} \pi (\cdot )=\left( \left\langle \cdot ,\,\varvec{\varphi ^{D}(t)}\right\rangle \right) ^{T}A^{T}\varvec{\varphi ^{S}(t)}. \end{aligned}$$
(28)

Note that \(\pi \) is independent of any choice of basis of \(S,\,D\) and E. Using (28), once we get the vectorial estimate in E from (9), we can immediately compute the functional estimate \(\widehat{\beta }^{E}_{n}(t)=(\varvec{\widehat{\beta }^{E}_{n}})^{T}\varvec{\varphi ^{E}(t)},\) and then obtain the functional estimate in D,  i.e., \(\widehat{\beta }^{D}_{n}=(\pi )^{-1}(\widehat{\beta }^{E}_{n}).\)

Appendix 2: Increasing information property

In this section, we discuss an interesting property concerning the behavior of the eigenvalues of the covariance matrix when its dimension increases.

Let \(\{M^{(n)}=[m^{(n)}_{ij}],\,n\ge 1\}\) be a sequence of symmetric matrices such that, for each \(n\ge 1,\,M^{(n)}\) is a \(n\times n\) matrix with \(m^{(n)}_{ij}=m^{(n-1)}_{ij}\) for any \(i,\,j\le n-1.\) In other words, \(M^{(n-1)}\) is obtained by \(M^{(n)}\) by deleting the last row and column. The eigenvalues are real, and are ordered according to the following general result proved by Cauchy.

Theorem 1

(See Hawkins 1997, p. 125) On the nested sequence \((M^{(n)})_{n}\) of matrices given above, denote with \(\{\lambda ^{n}_{k};\,k=1,\ldots ,n\}\) the sequences of the ordered eigenvalues of \(M^{(n)}.\) Then, for any \(n\ge 1,\)

$$\begin{aligned} \lambda ^{n+1}_{1} \ge \lambda ^{n}_{1} \ge \lambda ^{n+1}_{2} \ge \lambda ^{n}_{2} \ge \lambda ^{n+1}_{3} \ge \cdots \ge \lambda ^{n}_{n} \ge \lambda ^{n+1}_{n+1}. \end{aligned}$$

A direct consequence of the previous theorem is

$$\begin{aligned} \lambda _{i}^{k}\le \lambda _{i}^{d}, \quad \lambda _{i}^{i} \le \lambda _{k}^{k}, \quad \forall i\le k\le d. \end{aligned}$$
(29)

This result is applied in Sect. 4.2, where \(M^{(n)}\) is represented the covariance matrix of the random vector \((\langle X,\,\varphi _{1}\rangle ,\ldots ,\langle X,\,\varphi _{n}\rangle ).\) In this context, a direct interpretation of (29) is that the variance of X projected into a subspace increases when further components are added.

Appendix 3: Simulation settings

The settings of the simulation study presented in Sect. 3 are the following.

Data \(x_{i}(t)\) and regression coefficient \(\beta (t)\) belong to the Hilbert space \(L^{2}(T)\) with \(T= [-1,\,1]\) closed interval.

For each \(i = 1,\ldots ,n\) where n is the sample size (in our examples \(n=500\)),

$$\begin{aligned} X_{i}(t) = \sum \limits _{j \in J_{i}} \alpha _{j} \eta _{j} \theta ^{X}_{j}(t), \end{aligned}$$

where \(\{\theta _{k}^{X}(t)\} \equiv \{1/\sqrt{2}\} \bigcup \{\cos {(\pi k t)},\,k = 1,\ldots \},\,\alpha _{j}\) are randomly sampled from a uniform distribution \(U \sim \text {Unif}_{[-10,\,10]},\,\eta _{1} = 0.01,\,\eta _{j} = 1/j,\) for \(j > 1\) and \(J_{i}\) is a subset of size Z [with Z Poisson random variable \(Z \sim {\mathcal {P}}(\lambda )\)] of the integer from 1 to \(2\,*\,Z.\) We set \(\lambda = 10.\)

Chosen a function \(\beta (t) \in L^{2}(T)\) the scalar responses \(y_{1},\ldots ,y_{n}\) are generated as \(y_{i} = \int \nolimits _{T} \beta (t)X_{i}(t) dt + \epsilon _{i},\) where \(\epsilon _{i} \sim {\mathcal {N}}(0,\,1).\) We repeat the estimation procedure \(M = 100\) times.

In this setting the space S where the data are generated is the space of the even functions of \(L^{2}(T)\) and its orthogonal space \(S^{\perp }\) is composed by the odd functions of \(L^{2}(T).\) By definition, E is the projection of a sub-space D on S,  and hence E will be made by even functions. In particular, we defined E as the space of the even polynomials of degree at most 4, i.e., \(E=\mathrm{{Span}}\{1,\,t^{2},\,t^{4}\}.\) For computational aspects, we adopt an equivalent orthonormal basis given by the Legendre polynomials, i.e., \(E=\text {Span}\{\phi _{0},\,\phi _{2},\,\phi _{4}\},\) where

$$\begin{aligned} \phi _{0}=1/\sqrt{2},\quad \phi _{2}=\sqrt{5/8}\left( 3t^{2}-1\right) ,\quad \phi _{4}=\sqrt{9/128} \left( 35 t^{4} - 30 t^{2} +3\right) . \end{aligned}$$

Figure 1 shows the behavior of the estimator \(\widehat{\beta }^{D}_{n}\) for different choices of D that maintain the same projected space E. Thus, we introduce a parameter \(\theta \in [0,\,2\pi )\) and we define

$$\begin{aligned} D_{\theta } :=\text {Span} \left\{ \cos (\theta )\phi _{0}+\sin (\theta )\phi _{1},\phi _{2}, \phi _{4}\right\} , \end{aligned}$$

where \(\phi _{1}=\sqrt{3/2}t\) is the Legendre polynomial of degree 1. Note that E represents the projection of \(D_{\theta }\) on S for any \(\theta \in [0,\,2\pi ).\) In Fig. 1, we set \(\beta (t)=t^{2}+2t+1/3,\) so that \(\beta (t)\in D_{\pi /3}.\)

In Figs. 2 and 3 we are not interested in the bias on \(S^{\perp }\) and hence we take \(D\equiv E=\text {Span}\{1,\,t^{2},\,t^{4}\}.\) Figure 2 is dedicated to the study of the bias \(\gamma (t),\) and hence we set \(\beta (t)\) that does not lie in D :  in particular \(\beta (t)=\mathbf {1}_{[-0.5,0.5]}(t).\) Figure 3 illustrates the bias–variance trade-off between D and the sub-space generated by the FPCs. Hence, we set a true \(\beta (t)\) that lies in D: in particular \(\beta (t)=t^{4}.\)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ghiglietti, A., Ieva, F., Paganoni, A.M. et al. On linear regression models in infinite dimensional spaces with scalar response. Stat Papers 58, 527–548 (2017). https://doi.org/10.1007/s00362-015-0710-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-015-0710-2

Keywords

  • Functional regression
  • Functional principal component analysis
  • Asymptotic properties of statistical inference

Mathematics Subject Classification

  • 62J05
  • 62M10