1 Introduction

In recent years, methods for representing data by functions or curves have received much attention. Such data are known in the literature as functional data (Bongiorno et al. 2014; Ferraty and Vieu 2006; Horváth and Kokoszka 2012; Ramsay and Silverman 2005). Examples of functional data can be found in various application domains, such as medicine, economics, meteorology and many others. In previous papers on functional data analysis, objects are characterized only by one feature observed at many time points. Methods of functional data analysis are becoming increasingly popular, e.g. in the cluster analysis (Jacques and Preda 2013; James and Sugar 2003; Peng and Müller 2008), classification (Chamroukhi et al. 2013; Delaigle and Hall 2012; Mosler and Mozharovskyi 2015; Rossi and Villa 2006) and regression (Ferraty et al. 2012; Goia and Vieu 2014; Kudraszow and Vieu 2013; Peng et al. 2015; Rachdi and Vieu 2006; Wang et al. 2015). Unfortunately, multivariate data methods cannot be directly used for functional data, because of the problem of dimensionality and difficulty in putting functional data into order. In many applications there is a need to use statistical methods for objects characterized by many features observed at many time points (double multivariate data), and such data are called multivariate functional data. A pioneering theoretical works were Besse (1979), Besse and Ramsay (1986), where random variables take values in a general Hilbert space. Saporta (1981) presents an analysis of multivariate functional data from the point of view of factorial methods (principal components and canonical analysis). Berrenderoa et al. (2011), Jacques and Preda (2014) and Panaretos et al. (2010) discussed principal component analysis for multivariate functional data (MFPCA). In this paper we extend the construction of principal component analysis to other projective dimension reduction techniques, i.e. discriminant coordinates and canonical correlation analysis for multivariate functional data.

Dimension reduction is a very active field of statistical research. We focused only on projective dimension reduction techniques (Burges 2009): principal components analysis (PCA), canonical correlation analysis (CCA) and Fisher discriminant analysis (DCA). These procedures are a transformation that allows us to obtain a linear projection of our data, originally in \(\mathbf {R}^p\) onto \(\mathbf {R}^k\), where \(k < p\). Along with reducing the data dimensions, the data are also projected in a different orientation. Altogether, this transformation presents the data in a manner that stresses out the trends in it facilitating its interpretation.

The rest of this paper is organized as follows. We first review the concept of transformation of discrete data to multivariate functional data (Sect. 2). Section 3 contains a review principal components analysis for multivariate functional data. Sections 4 and 5 contain our extension of existing works for univariate functional data to multivariate, respectively for CCA and discriminant coordinates. Section 6 contains the results of our experiments on the real data set. Conclusions are given in Sect. 7.

2 Transformation of discrete data to multivariate functional data

Let X(t) be a stochastic process with continuous parameter \(t\in I\). Moreover, assume that \(X\in L_2(I)\), where \(L_2(I)\) is a Hilbert space of square integrable functions on the interval I and that the process X(t) has the following representation:

$$\begin{aligned} X(t)=\sum _{b=0}^{B}c_b\varphi _b(t),\ t\in I, \end{aligned}$$
(1)

where \(\{\varphi _b\}\) are orthonormal basis functions, and \(c_0,c_1,\ldots ,c_B\) are the random coefficients.

Many financial, meteorological and other data are recorded at discrete moments in time. Let \(x_j\) denote an observed value of process X(t) at the jth time point \(t_j\), where I is a compact set such that \(t_j \in I\), for \(j=1,...,J\). Then our data consist of J pairs \((t_{j},x_{j})\). This discrete data can be smoothed by continuous function x(t), where \(t \in I\) (Ramsay and Silverman 2005).

Let \(\pmb {x}=(x_1,x_2,\ldots ,x_{J})'\), \(\pmb {c}=(c_0,c_1,\ldots ,c_B)'\) and \(\pmb {\Phi }(t)\) be a matrix of dimension \(J \times (B+1)\) containing the values \(\varphi _b(t_j)\), \(b=0,1,...,B, j=1,2,...,J\). The coefficient \(\pmb {c}\) in (1) is estimated by the least squares method, that is, so as to minimize the function:

$$\begin{aligned} S(\pmb {c})=\left( \pmb {x} - \pmb {\Phi }(t) \pmb {c} \right) ' \left( \pmb {x} - \pmb {\Phi }(t) \pmb {c} \right) . \end{aligned}$$

Differentiating \(S(\pmb {c})\) with respect to the vector \(\pmb {c}\), we obtain the least squares method estimator

$$\begin{aligned} \hat{\pmb {c}}=\left( \pmb {\Phi }'(t) \pmb {\Phi }(t) \right) ^{-1} \pmb {\Phi }'(t) \pmb {x}. \end{aligned}$$

Then

$$\begin{aligned} x(t)=\sum _{b=0}^{B}\hat{c}_b\varphi _b(t),\ t\in I. \end{aligned}$$
(2)

The degree of smoothness of the function x(t) depends on the value B (a small value of B causes more smoothing of the curves). The optimum value for B is selected using the Bayesian information criterion (BIC):

$$\begin{aligned} {{\mathrm{BIC}}}=\ln \left( \sum _{j=0}^J\big (x_j - \sum _{b=0}^{B}\hat{c}_b\varphi _b(t_j)\big )^2\right) + (B+1)\left( \frac{\ln J}{J} \right) . \end{aligned}$$

We decided to use such criterion because Akaike Information Criterion (AIC) better measures predictive accuracy while BIC better measures goodness of fit (Berk 2008; Shmueli 2010; Sober 2002).

Let us assume that there are n independent pairs of values \((t_{ij},x_{ij})\), \(j=1,...,J\), \(i=1,...,n\). These discrete data are smoothed to continuous functions in the following form:

$$\begin{aligned} x_i(t)=\sum _{b=0}^{B_i}\hat{c}_{ib}\varphi _{b}(t),\ i=1,...,n,\ t \in I. \end{aligned}$$

Among all the \(B_1,B_2,...,B_n\) one common value of B is chosen, as the modal value of the numbers \(B_1,B_2,...,B_n\).

The set of functions \(\left\{ x_1(t),..., x_n(t):t\in I\right\} \) obtained in this way is called functional data (see Ramsay and Silverman 2005). Note that in some cases it could be interesting to discretize smooth functions. This is at the center of variable selection methods in functional data analysis (Aneiros and Vieu 2014).

So far we have been dealing with data characterized by one feature. Our considerations can be generalized to the case of \(p\ge 2\) features. Then our data consist of n independent vector functions \(\pmb {x}_i(t)=(x_{i1}(t),x_{i2}(t),....,x_{ip}(t))'\), \(t\in I\), \(i=1,...,n\). The data \(\left\{ \pmb {x}_1(t),..., \pmb {x}_n(t):t\in I\right\} \) will be called multivariate functional data. Multivariate functional data can conveniently be treated as realizations of a finite multidimensional stochastic process \(\pmb {X}(t)=(X_1(t),X_2(t),...,X_p(t))'\) with continuous parameter \(t \in I\). We will further assume, that \(\pmb {X}\in L_2^p(I)\), where \(L_2(I)\) is a Hilbert space of square integrable functions on the interval I equipped with the following inner product:

$$\begin{aligned} {<}\pmb {u},\pmb {v}{>}\,=\int _I \pmb {u}'(t)\pmb {v}(t)dt. \end{aligned}$$

We consider the case, where the dth component of process \(\pmb {X}(t)\) can be represented by a finite number of orthonormal basis functions \(\{\varphi _b\}\)

$$\begin{aligned} X_d(t)=\sum _{b=0}^{B_d}c_{db}\varphi _b(t),\ t\in I,\ d=1,2,...,p, \end{aligned}$$

where \(c_{db}\) are random variables. Let

$$\begin{aligned}&\pmb {c}=(c_{10},...,c_{1B_1},...,c_{p0},...,c_{pB_p})',\nonumber \\&\pmb {\Phi }(t)= \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \pmb {\varphi }'_{B_1}(t) &{} \pmb {0} &{} \ldots &{} \pmb {0} \\ \pmb {0} &{} \pmb {\varphi }'_{B_2}(t) &{} \ldots &{} \pmb {0} \\ \ldots &{} \ldots &{} \ldots &{} \ldots \\ \pmb {0}&{} \pmb {0} &{} \ldots &{} \pmb {\varphi }'_{B_p}(t) \\ \end{array} \right] , \end{aligned}$$
(3)

where \(\pmb {\varphi }_{B_d}(t)=(\varphi _{0}(t),...,\varphi _{B_d}(t))'\), \(d=1,...,p\). Then

$$\begin{aligned} \pmb {X}(t)=\pmb {\Phi }(t) \pmb {c},\ t \in I. \end{aligned}$$

3 Principal component analysis for multivariate functional data

The idea of principal component analysis is to reduce the dimensionality of a data set consisting of a large number of correlated variables, while retaining as much as possible of the variation present in the data set. This is achieved by transforming to a new set of variables, the principal components, which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all of the original variables. Suppose that we are observing a p-dimensional random vector \(\pmb {X}=(X_1,X_2,...,X_p)' \in \mathbb {R}^p\). In the first step we look for a linear combination \(U_1 = u_{11} X_1 + u_{12} X_2 + ... + u_{1p} X_p = \pmb {u}_1'\pmb {X}\) of the elements of vector \(\pmb {X}\) having maximum variance. The variable \(U_1\) is called the first principal component. Next, we look for a linear combination \(U_2=\pmb {u}_2'\pmb {X}\), uncorrelated with the first principal component \(U_1\), having maximum variance, and so on, so that at the kth stage a linear combination \(U_k=\pmb {u}_k'\pmb {X}\), called the kth principal component, is found that has maximum variance subject to being uncorrelated with the first \(k-1\) principal components (Jolliffe 2002). The observations can be presented graphically as points on a plane \((U_1,U_2)\). The functional case of PCA (FPCA) is a more informative way of looking at the variability structure in the variance-covariance operator for one dimensional functional data (Górecki and Krzyśko 2012). In this section we present PCA for multivariate functional data (Jacques and Preda 2014).

Without loss of generality we will further assume, that \({{\mathrm{E}}}(\pmb {X})=\pmb {0}\). In principal component analysis in the multivariate functional case, we are interested in finding the inner product

$$\begin{aligned} U={<}\pmb {u},\pmb {X}{>}\, = \int _{I}\pmb {u}'(t) \pmb {X}(t)dt \end{aligned}$$

having maximal variance for all \(\pmb {u}\in L_2^p(I)\) such, that \({<}\pmb {u},\pmb {u}{>}=1\). Let

$$\begin{aligned} \lambda _1=\sup _{\pmb {u} \in L_2^p(I)} {{\mathrm{Var}}}({<}\pmb {u},\pmb {X}{>})\,= {{\mathrm{Var}}}({<}\pmb {u}_1,\pmb {X}{>}), \end{aligned}$$

where \({<}\pmb {u}_1,\pmb {u}_1{>}\,=1\). The inner product \(U_1\,={<}\pmb {u}_1,\pmb {X}{>}\) will be called the first principal component, and the vector function \(\pmb {u}_1\) will be called the first vector weight function. Subsequently we look for the second principal component \(U_2={<}\pmb {u}_2,\pmb {X}{>}\), which maximizes \({{\mathrm{Var}}}({<}\pmb {u},\pmb {X}{>})\), is such that \({<}\pmb {u}_2,\pmb {u}_2{>}=1\), and is not correlated with the first functional principal component \(U_1\), i.e. is subject to the restriction \({<}\pmb {u}_1,\pmb {u}_2{>}=0\).

In general, the kth functional principal component \(U_k={<}\pmb {u}_k,\pmb {X}{>}\) satisfies the conditions:

$$\begin{aligned} \lambda _k&=\sup _{\pmb {u} \in L_2^p(I)} {{\mathrm{Var}}}({<}\pmb {u},\pmb {X}{>})= {{\mathrm{Var}}}({<}\pmb {u}_k,\pmb {X}{>}),\\&{<}\pmb {u}_{\kappa _1},\pmb {u}_{\kappa _2}{>}\,=\delta _{{\kappa _1}{\kappa _2}}, \quad \kappa _1, \kappa _2=1,...,k, \end{aligned}$$

where

$$\begin{aligned} \delta _{{\kappa _1}{\kappa _2}}= {\left\{ \begin{array}{ll} 1&{}\,\,\,\text {if } \kappa _1=\kappa _2\\ 0&{}\,\,\,\text {if } \kappa _1\ne \kappa _2. \end{array}\right. } \end{aligned}$$

The expression \((\lambda _k,\pmb {u}_k(t))\) will be called the kth principal system of the process \(\pmb {X}(t)\).

In sect. 2 we have shown, that the process \(\pmb {X}(t)\) can be represented as \(\pmb {X}(t)=\pmb {\Phi }(t) \pmb {c}\), \(t \in I\). Now let us consider the principal components of the random vector \(\pmb {c}\). From \({{\mathrm{E}}}(\pmb {X})=\pmb {0}\) we have \({{\mathrm{E}}}(\pmb {c})=\pmb {0}\). Let us denote \({{\mathrm{Var}}}(\pmb {c})=\pmb {\Sigma }\). The kth principal component \(U^{*}_k={<}\pmb {\omega }_k, \pmb {c}{>}\) of this vector satisfies the conditions:

$$\begin{aligned} \gamma _k&=\sup _{\pmb {\omega } \in \mathbb {R}^{K+p}} {{\mathrm{Var}}}({<}\pmb {\omega },\pmb {c}{>})= \sup _{\pmb {\omega } \in \mathbb {R}^{K+p}} \pmb {\omega }'{{\mathrm{Var}}}(\pmb {c}) \pmb {\omega }\\&= \sup _{\pmb {\omega } \in \mathbb {R}^{K+p}} \pmb {\omega }'\pmb {\Sigma } \pmb {\omega }=\pmb {\omega }_k'\pmb {\Sigma } \pmb {\omega }_k,\\ \pmb {\omega }'_{\kappa _1}\pmb {\omega }_{\kappa _2}&=\delta _{{\kappa _1}{\kappa _2}}, \end{aligned}$$

where \(\kappa _1 ,\kappa _2=1,...,k\), \(K=B_1+...+B_p\). The expression \((\gamma _k,\pmb {\omega }_k)\) will be called the kth principal system of vector \(\pmb {c}\).

Determining the kth principal system of vector \(\pmb {c}\) is equivalent to solving for the eigenvalue and corresponding eigenvectors of the covariance matrix \(\pmb {\Sigma }\) of that vector, standardized so that \(\pmb {\omega }'_{\kappa _1}\pmb {\omega }_{\kappa _2}=\delta _{{\kappa _1}{\kappa _2}}\).

Theorem 1

The kth principal system \((\lambda _k,\pmb {u}_k(t))\) of the stochastic process \(\pmb {X}(t)\) is related to the kth principal system \((\gamma _k,\pmb {\omega }_k)\) of the random vector \(\pmb {c}\) by the equations:

$$\begin{aligned} \lambda _k = \gamma _k,\ \pmb {u}_k(t)=\pmb {\Phi }(t)\pmb {\omega }_k,\ t \in I, \end{aligned}$$

where \(k=1,...,s\) and \(s={{\mathrm{rank}}}(\pmb {\Sigma })\).

Proof

It may be assumed (Ramsay and Silverman 2005), that the vector weight function \(\pmb {u}(t)\) and the process \(\pmb {X}(t)\) are in the same space, i.e. the function \(\pmb {u}(t)\) can be written in the form:

$$\begin{aligned} \pmb {u}(t)= \pmb {\Phi }(t) \pmb {\omega }, \end{aligned}$$

where \(\pmb {\omega } \in \mathbb {R}^{K+p}\). Then

$$\begin{aligned} {<} \pmb {u}, \pmb {X}{>}&= \int _I\pmb {u}'(t)\pmb {X}(t)dt=\int _I\pmb {\omega }'\pmb {\Phi }'(t)\pmb {\Phi }(t)\pmb {c}dt \\&=\pmb {\omega }'\int _I\pmb {\Phi }'(t)\pmb {\Phi }(t)dt\pmb {c}= \pmb {\omega }' \pmb {I}_{K+p} \pmb {c} = \pmb {\omega }' \pmb {c}, \end{aligned}$$

where

$$\begin{aligned} {{\mathrm{E}}}({<}\pmb {u}, \pmb {X}{>})&= \pmb {\omega }' {{\mathrm{E}}}(\pmb {c}) = \pmb {\omega }' \pmb {0} =0,\\ {{\mathrm{Var}}}({<}\pmb {u}, \pmb {X}{>})&= \pmb {\omega }' {{\mathrm{E}}}(\pmb {c} \pmb {c}') \pmb {\omega } = \pmb {\omega }' \pmb {\Sigma } \pmb {\omega }. \end{aligned}$$

Let us consider the first functional principal component of process \(\pmb {X}(t)\):

$$\begin{aligned} \lambda _1=\sup _{\pmb {u} \in L_2^p(I)} {{\mathrm{Var}}}({<}\pmb {u},\pmb {X}{>})= {{\mathrm{Var}}}({<}\pmb {u}_1,\pmb {X}{>}), \end{aligned}$$

where \({<}\pmb {u}_1,\pmb {u}_1{>}=1\). This is equivalent to saying that

$$\begin{aligned} \gamma _1=\sup _{\pmb {\omega } \in \mathbb {R}^{K+p}} {{\mathrm{Var}}}({<}\pmb {\omega },\pmb {c}{>})= \sup _{\pmb {\omega } \in \mathbb {R}^{K+p}} \pmb {\omega }'{{\mathrm{Var}}}(\pmb {c}) \pmb {\omega } = \pmb {\omega }_1'\pmb {\Sigma } \pmb {\omega }_1, \end{aligned}$$

where \(\pmb {\omega }'_1\pmb {\omega }_1=1\).

This is the definition of the first principal component of the random vector \(\pmb {c}\). On the other hand, if we begin with the first principal system of the random vector \(\pmb {c}\) defined by \((\gamma _1,\pmb {\omega }_1)\), we will obtain the first principal system for the process \(\pmb {X}(t)\) from the equations

$$\begin{aligned} \lambda _1 = \gamma _1,\ \pmb {u}_1(t)=\pmb {\Phi }(t)\pmb {\omega }_1. \end{aligned}$$

We may extend these considerations to the second principal system and so on. \(\square \)

Principal component analysis for random vectors \(\pmb {c}\) is based on the matrix \(\pmb {\Sigma }\). In practice this matrix is unknown. We estimate it on the basis of n independent realizations \(\pmb {x}_1(t),\pmb {x}_2(t),....,\pmb {x}_n(t)\) of the form \(\pmb {x}_i(t)=\pmb {\Phi }(t) \hat{\pmb {c}}_i\) of the random process \(\pmb {X}(t)\), where the vectors \(\hat{\pmb {c}}_i\) are centered, \(i=1,2,...,n\).

Let \(\hat{\pmb {C}}=(\hat{\pmb {c}}_1,\hat{\pmb {c}}_2,...,\hat{\pmb {c}}_n)'.\) Then

$$\begin{aligned} \hat{\pmb {\Sigma }}= \frac{1}{n} \hat{\pmb {C}}' \hat{\pmb {C}}. \end{aligned}$$

Let \(\hat{\gamma }_1\ge \hat{\gamma }_2\ge ...\ge \hat{\gamma }_s\) be non-zero eigenvalues of matrix \(\hat{\pmb {\Sigma }}\), and \(\hat{\pmb {\omega }}_1, \hat{\pmb {\omega }}_2,..., \hat{\pmb {\omega }}_s\) the corresponding eigenvectors, where \(s={{\mathrm{rank}}}(\hat{\pmb {\Sigma }})\).

Moreover the kth principal system of the random process \(\pmb {X}(t)\) determined from a sample has the following form:

$$\begin{aligned} (\hat{\lambda }_k = \hat{\gamma }_k, \hat{\pmb {u}}_k(t)=\pmb {\Phi }(t)\hat{\pmb {\omega }}_k),\ k=1,...,s. \end{aligned}$$

Hence the coefficients of the projection of the ith realization \(\pmb {x}_{i}(t)\) of process \(\pmb {X}(t)\) on the kth functional principal component are equal to:

$$\begin{aligned} \hat{U}_{ik}&={<}\hat{\pmb {u}}_k,\pmb {x}_{i}{>}=\int _I\hat{\pmb {\omega }}_k'\pmb {\Phi }'(t)\pmb {\Phi }(t)\hat{\pmb {c}}_idt\\&=\hat{\pmb {\omega }}_k'\int _I\pmb {\Phi }'(t)\pmb {\Phi }(t)dt\hat{\pmb {c}}_i= \hat{\pmb {\omega }}_k'\hat{\pmb {c}}_i, \end{aligned}$$

for \(i=1,2,...,n\)\(k=1,2,...,s\). Finally the coefficients of the projection of the ith realization \(\pmb {x}_{i}(t)\) of the process \(\pmb {X}(t)\) on the plane of the first two functional principal components from the sample are equal to \((\hat{\pmb {\omega }}'_1 \hat{\pmb {c}}_i, \hat{\pmb {\omega }}'_2 \hat{\pmb {c}}_i)\), \(i=1,2,...,n.\)

4 Discriminant coordinates for multivariate functional data

Now let us consider the case where the samples originate from L groups. We would often like to present them graphically, to see their configuration or to eliminate outlying observations. However it may be difficult to produce such a presentation even if only three features are observed. A different method must therefore be sought for presenting multidimensional data originating from multiple groups. To make the task easier, in the first step every p-dimensional observation \(\pmb {X}=(X_1,X_2,...,X_p)' \in \mathbb {R}^p\) can be transformed into a one-dimensional observation \(U_1 = u_{11} X_1 + u_{12} X_2 + ... + u_{1p} X_p = \pmb {u}_1'\pmb {X}\), and the resulting one-dimensional observations can be presented graphically as points on a straight line. In the second step we can define a second linear combination \(U_2=\pmb {u}_2\pmb {X}\) not correlated with the first, and present the observations graphically as points on a plane \((U_1,U_2)\). The space of discriminant coordinates is a space convenient for the use of various classification methods (methods of discriminant analysis). When \(L=2\) we obtain only one discriminant coordinate, coinciding with the well-known Fisher’s linear discriminant function (Fisher 1936). The functional case of discriminant coordinates analysis (FDCA) and its kernel variant (KFDCA) are also well known (Górecki et al. 2014). In this section we propose FDCA for multivariate functional data (MFDCA). Let \(\pmb {x}_{l1}(t),\pmb {x}_{l2}(t),...,\pmb {x}_{ln_l}(t)\) be \(n_l\) independent realizations of a p-dimensional stochastic process \(\pmb {X}(t)\) belonging to the lth class, where \(l=1,2,\ldots ,L\). Our purpose is to construct the discriminant coordinate based on multivariate functional data, i.e. to construct

$$\begin{aligned} U={<}\pmb {u},\pmb {X}{>} = \int _{I}\pmb {u}'(t) \pmb {X}(t)dt \end{aligned}$$

such that their between-class variance is maximal compared with the total variance, where \(\pmb {u} \in L_2^p(I)\). The vector function \(\pmb {u}(t)=(u_1(t),u_2(t),...,u_p(t))'\) is called the vector weight function.

More precisely, the first functional discriminant coordinate \(U_1=\,{<}\pmb {u}_1,\pmb {X}{>}\) is defined as

$$\begin{aligned} \lambda _1&=\sup _{\pmb {u} \in L_2^p(I)}\frac{{{\mathrm{Var}}}_B({<}\pmb {u},\pmb {X}{>})}{{{\mathrm{Var}}}_T({<}\pmb {u},\pmb {X}{>})}\\&= \frac{{{\mathrm{Var}}}_B({<}\pmb {u}_1,\pmb {X}{>})}{{{\mathrm{Var}}}_T({<}\pmb {u}_1,\pmb {X}{>})}, \end{aligned}$$

subject to the constraint

$$\begin{aligned} {{\mathrm{Var}}}_T({<}\pmb {u}_1,\pmb {X}{>})=1, \end{aligned}$$
(4)

where \({{\mathrm{Var}}}_B({<}\pmb {u},\pmb {X}{>})\) and \({{\mathrm{Var}}}_T({<}\pmb {u},\pmb {X}{>})\) are respectively the between-class and total variance of discriminant coordinate \(U_1\). Condition (4) ensures the uniqueness of the first discriminant coordinate \(U_1\).

Similarly we can construct the kth functional discriminant coordinate

$$\begin{aligned} U_k=\,{<}\pmb {u}_k,\pmb {X}{>}, \end{aligned}$$

where the vector weight function \(\pmb {u}_k(t)\) is defined as

$$\begin{aligned} \lambda _1&=\sup _{\pmb {u} \in L_2^p(I)}\frac{{{\mathrm{Var}}}_B({<}\pmb {u},\pmb {X}{>})}{{{\mathrm{Var}}}_T({<}\pmb {u},\pmb {X}{>})}\\&= \frac{{{\mathrm{Var}}}_B({<}\pmb {u}_k,\pmb {X}{>})}{{{\mathrm{Var}}}_T({<}\pmb {u}_k,\pmb {X}{>})}, \end{aligned}$$

subject to the constraint

$$\begin{aligned} {{\mathrm{Var}}}_T({<}\pmb {u}_k,\pmb {X}{>})=1. \end{aligned}$$

Moreover the kth discriminant coordinate \(U_k\) is not correlated with the first \(k-1\) discriminant coordinates. The expression \((\lambda _k,\pmb {u}_k(t))\) will be called the kth discriminant system of the process \(\pmb {X}(t)\).

Let us recall that the process \(\pmb {X}(t)\) can be represented as \(\pmb {X}(t)=\pmb {\Phi }(t) \pmb {c}, t \in I\). Now let us consider the discriminant coordinates of the random vector \(\pmb {c}\). The kth discriminant coordinate \(U^{*}_k=\,{<}\pmb {\omega }_k, \pmb {c}{>}\) of this vector satisfies the condition:

$$\begin{aligned} \gamma _k&=\sup _{\pmb {\omega } \in \mathbb {R}^{K+p}} {{\mathrm{Var}}}_B({<}\pmb {\omega },\pmb {c}{>})= {{\mathrm{Var}}}_B({<}\pmb {\omega }_k,\pmb {c}{>}) \\&= \pmb {\omega }_k'{{\mathrm{Var}}}_B(\pmb {c}) \pmb {\omega }_k = \pmb {\omega }_k' \pmb {B} \pmb {\omega }_k, \end{aligned}$$

subject to the restriction

$$\begin{aligned} {{\mathrm{Var}}}_T({<}\pmb {\omega }_k,\pmb {c}{>}) = \pmb {\omega }_k'{{\mathrm{Var}}}_T(\pmb {c}) \pmb {\omega }_k = \pmb {\omega }_k' \pmb {T} \pmb {\omega }_k = 1. \end{aligned}$$

Additionally the kth discriminant coordinate \(U^{*}_k\) is not correlated with the first \(k-1\) discriminant coordinates, i.e.

$$\begin{aligned} \pmb {\omega }_{\kappa _1}'\pmb {T}\pmb {\omega }_{\kappa _2}=\delta _{{\kappa _1}{\kappa _2}}, \qquad \kappa _1, \kappa _2=1,...,k. \end{aligned}$$

The expression \((\gamma _k,\pmb {\omega }_k)\) will be called the kth discriminant system of the random vector \(\pmb {c}\).

Theorem 2

The kth discriminant system \((\lambda _k,\pmb {u}_k(t))\) of the stochastic process \(\pmb {X}(t)\) is related to the kth discriminant system \((\gamma _k,\pmb {\omega }_k)\) of the random vector \(\pmb {c}\) by the equations:

$$\begin{aligned} \lambda _k = \gamma _k,\ \pmb {u}_k(t)=\pmb {\Phi }(t)\pmb {\omega }_k,\ t \in I, \end{aligned}$$

where \(k=1,...,s\)\(s=\min (K+p,L-1)\).

Proof

We assume, that the vector weight function \(\pmb {u}(t)\) and the process \(\pmb {X}(t)\) are in the same space, i.e. the function \(\pmb {u}(t)\) can be written in the form:

$$\begin{aligned} \pmb {u}(t)= \pmb {\Phi }(t) \pmb {\omega } \end{aligned}$$

where \(\pmb {\omega } \in \mathbb {R}^{K+p}\). Than

$$\begin{aligned} {<}\pmb {u}, \pmb {X}{>} \,= \pmb {\omega }' \pmb {c}. \end{aligned}$$

Hence the between-class variance of the inner product \({<} \pmb {u}, \pmb {X}{>}\) is

$$\begin{aligned} {{\mathrm{Var}}}_B({<}\pmb {u}, \pmb {X}{>}) = \pmb {\omega }' {{\mathrm{Var}}}_B(\pmb {c}) \pmb {\omega }, \end{aligned}$$

and the total variance

$$\begin{aligned} {{\mathrm{Var}}}_T({<}\pmb {u}, \pmb {X}{>}) = \pmb {\omega }' {{\mathrm{Var}}}_T(\pmb {c}) \pmb {\omega }, \end{aligned}$$

where \({{\mathrm{Var}}}_B(\pmb {c})\) and \({{\mathrm{Var}}}_T(\pmb {c})\) are respectively the matrices of sum of squares and products of between-class and total variance.

For the first functional discriminant coordinate of the process \(\pmb {X}(t)\) we have:

$$\begin{aligned} \lambda _1&=\sup _{\pmb {u} \in L_2^p(I)}\frac{{{\mathrm{Var}}}_B({<}\pmb {u},\pmb {X}{>})}{{{\mathrm{Var}}}_T({<}\pmb {u},\pmb {X}{>})}\\&=\sup _{\pmb {\omega } \in \mathbb {R}^{K+p}}\frac{\pmb {\omega }'{{\mathrm{Var}}}_B(\pmb {c}) \pmb {\omega }}{\pmb {\omega }'{{\mathrm{Var}}}_T(\pmb {c}) \pmb {\omega }} = \frac{\pmb {\omega }_1'{{\mathrm{Var}}}_B(\pmb {c}) \pmb {\omega }_1}{\pmb {\omega }_1'{{\mathrm{Var}}}_T(\pmb {c}) \pmb {\omega }_1}, \end{aligned}$$

where \(\pmb {\omega }_1'{{\mathrm{Var}}}_T(c)\pmb {\omega }_1=1\). This is equivalent to

$$\begin{aligned} \gamma _1=\sup _{\pmb {\omega } \in \mathbb {R}^{K+p}} {{\mathrm{Var}}}_B({<}\pmb {\omega },\pmb {c}{>})= \sup _{\pmb {\omega } \in \mathbb {R}^{K+p}} \pmb {\omega }'{{\mathrm{Var}}}_B(\pmb {c}) \pmb {\omega } = \pmb {\omega }_1'\pmb {B} \pmb {\omega }_1, \end{aligned}$$

where \(\pmb {\omega }'_1\pmb {\omega }_1=1\).

This meets the definition of the first discriminant coordinate of the random vector \(\pmb {c}\). On the other hand, if the first discriminant system \((\gamma _1,\pmb {\omega }_1)\) defines the first discriminant coordinate of the random vector \(\pmb {c}\), we will obtain the first discriminant system for the process \(\pmb {X}(t)\) from the equations

$$\begin{aligned} \lambda _1 = \gamma _1,\ \pmb {u}_1(t)=\pmb {\Phi }(t)\pmb {\omega }_1. \end{aligned}$$

We may extend these considerations to the second discriminant system and so on.

\(\square \)

The matrices \({{\mathrm{Var}}}_B(\pmb {c})\) and \({{\mathrm{Var}}}_T(\pmb {c})\) are unknown and must be estimated based on the sample. Let \(\pmb {x}_{l1}(t), \pmb {x}_{l2}(t),...,\pmb {x}_{ln_l}(t)\) be a sample belonging to the lth class, where \(l=1,2,\ldots ,L\). The function \(\pmb {x}_{li}(t)\) has the form

$$\begin{aligned} \pmb {x}_{li}(t)=\pmb {\Phi }(t) \hat{\pmb {c}}_{li}, \end{aligned}$$

where \(\hat{\pmb {c}}_{lj}=\left( \hat{c}_{10}^{(lj)},...,\hat{c}_{1K_1}^{(lj)},...,\hat{c}_{p0}^{(lj)},...,\hat{c}_{pK_p}^{(lj)}\right) '\), \(i=1,2,...,n_l\), \(l=1,2,...,L\). Let

$$\begin{aligned} \bar{\pmb {c}}_{l}=\frac{1}{n_l} \sum _{i=1}^{n_l} \hat{\pmb {c}}_{li}. \end{aligned}$$

Then

$$\begin{aligned} \hat{{{\mathrm{Var}}}}_B(\pmb {c})=\hat{\pmb {B}}&= \frac{1}{L} \sum _{l=1}^{L} n_l \bar{\pmb {c}}_{l} \bar{\pmb {c}}_{l}',\\ \hat{{{\mathrm{Var}}}}_T(\pmb {c})&=\hat{\pmb {T}}=\frac{1}{n} \sum _{l=1}^{L} \sum _{i=1}^{n_l} n_l \hat{\pmb {c}}_{li} \hat{\pmb {c}}_{li}', \end{aligned}$$

where \(n=\sum _{l=1}^Ln_l\). Next we find non-zero eigenvalues \(\hat{\gamma }_1\ge \hat{\gamma }_2\ge ...\ge \hat{\gamma }_s\) and the corresponding eigenvectors \(\hat{\pmb {\omega }}_1, \hat{\pmb {\omega }}_2,..., \hat{\pmb {\omega }}_s\) of matrix \(\hat{\pmb {T}}^{-1}\hat{\pmb {B}}\), where \(s=\min (K+p,L-1)\). Furthermore the kth discriminant system of the random process \(\pmb {X}(t)\) has the following form:

$$\begin{aligned} (\hat{\lambda }_k = \hat{\gamma }_k, \hat{\pmb {u}}_k(t)=\pmb {\Phi }(t)\hat{\pmb {\omega }}_k),\ k=1,...,s=\min (K+p,L-1). \end{aligned}$$

Hence the coefficients of the projection of the ith realization \(\pmb {x}_{li}(t)\) of process \(\pmb {X}(t)\) belonging to the lth class on the kth functional discriminant component are equal to:

$$\begin{aligned} \hat{U}_{lik}={<}\hat{\pmb {u}}_k(t),\pmb {x}_{li}(t){>}=\hat{\pmb {\omega }}'_k \hat{\pmb {c}}_{li}, \end{aligned}$$

for \(i=1,2,...,n_l\), \(k=1,2,...,s\), \(l=1,2,...,L\).

5 Canonical correlation analysis for multivariate functional data

Suppose now that we are observing two random vectors \(\pmb {Y}=(Y_1,Y_2,...,Y_p)' \in \mathbb {R}^p\) and \(\pmb {X}=(X_1,X_2,...,X_q)' \in \mathbb {R}^q\) and looking for the relationship between them. This is the one of the main problems of canonical correlation analysis. We search for weight vectors \(\pmb {u} \in \mathbb {R}^p\) and \(\pmb {v} \in \mathbb {R}^q\), such that the linear combinations \(U_1 = u_{11} Y_1 + u_{12} Y_2 + ... + u_{1p} Y_p = \pmb {u}_1'\pmb {Y}\) and \(V_1 = v_{11} X_1 + v_2 X_{12} + ... + v_{1q} X_q = \pmb {v}_1'\pmb {X}\), called the first pair of canonical variables, are maximally correlated.

Canonical correlation analysis has been extended to the case of multivariate time series (Brillinger 2001), under the assumption of stationarity, and extension of canonical correlation to functional data has been proposed in Leurgans et al. (1993), where the need for regularization was pointed out. He et al. (2000) showed that random processes with finite basis expansion have simple canonical structures, analogously to the case of random vectors. This motivates to implement regularization by projecting random processes on a finite number of basis function. The idea to project processes on the finite k basis functions has been discussed in He et al. (2004). This projection is on a prespecified orhonormal basis. In this Section, we consider the canonical correlations for the multivariate functional data. The proposed method is a generalization of the method presented in He et al. (2004). Other generalization is presented by Dubin and Müller (2005).

Let \(\pmb {Y}(t)\) and \(\pmb {X}(t)\) are stochastic processes. We will further assume that \(\pmb {Y}\in L_2^p(I_1)\), \(\pmb {X}\in L_2^q(I_2)\) and each component \(Y_g(t)\) of process \(\pmb {Y}(t)\) and \(X_h(t)\) of process \(\pmb {X}(t)\) can be represented by a finite number of orthonormal basis functions \(\{\varphi _e\}\) and \(\{\varphi _f\}\) respectively:

$$\begin{aligned} Y_g(t)=\sum _{e=0}^{E_g}\alpha _{ge}\varphi _e(t), t\in I_1, g=1,2,...,p, \end{aligned}$$
$$\begin{aligned} X_h(t)=\sum _{f=0}^{F_h}\beta _{hf}\varphi _f(t), t\in I_2, h=1,2,...,q. \end{aligned}$$

Moreover let \({{\mathrm{E}}}(\pmb {Y})=\pmb {0}\), \({{\mathrm{E}}}(\pmb {X})=\pmb {0}\). This fact does not cause loss of generality, because functional canonical variables are calculated based on the covariance functions of processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\).

We introduce the following notation:

$$\begin{aligned} \pmb {\alpha }&=(\alpha _{10},...,\alpha _{1E_1},...,\alpha _{p0},...,\alpha _{pE_p})',\\ \pmb {\beta }&=(\beta _{10},...,\beta _{1F_1},...,\beta _{q0},...,\beta _{qF_q})',\\ \pmb {\Phi }_1(t)&= \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \pmb {\varphi }'_{E_1}(t) &{} \pmb {0} &{} \ldots &{} \pmb {0} \\ \pmb {0} &{} \pmb {\varphi }'_{E_2}(t) &{} \ldots &{} \pmb {0} \\ \ldots &{} \ldots &{} \ldots &{} \ldots \\ \pmb {0}&{} \pmb {0} &{} \ldots &{} \pmb {\varphi }'_{{{\mathrm{E}}}_p}(t) \\ \end{array} \right] ,\\ \pmb {\Phi }_2(t)&= \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \pmb {\varphi }'_{F_1}(t) &{} \pmb {0} &{} \ldots &{} \pmb {0} \\ \pmb {0} &{} \pmb {\varphi }'_{F_2}(t) &{} \ldots &{} \pmb {0} \\ \ldots &{} \ldots &{} \ldots &{} \ldots \\ \pmb {0}&{} \pmb {0} &{} \ldots &{} \pmb {\varphi }'_{F_q}(t) \\ \end{array} \right] , \end{aligned}$$

where \(\pmb {\varphi }_{E_1},...,\pmb {\varphi }_{E_p}\) and \(\pmb {\varphi }_{F_1},...,\pmb {\varphi }_{F_q}\) are orthonormal basis functions of space \(L_2(I_1)\) and \(L_2(I_2)\), respectively, and \(K_1 = E_1+E_2+...+E_p, K_2 = F_1+F_2+...+F_q\). Using the above matrix notation the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\) can be represented as:

$$\begin{aligned} \pmb {Y}(t)=\pmb {\Phi }_1(t) \pmb {\alpha },\quad \pmb {X}(t)=\pmb {\Phi }_2(t) \pmb {\beta }. \end{aligned}$$

Functional canonical variables U and V for stochastic processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\) are defined as follows

$$\begin{aligned} U={<}\pmb {u},\pmb {Y}{>} = \int _{I_1}\pmb {u}'(t) \pmb {Y}(t)dt,\quad V={<}\pmb {v},\pmb {X}{>} = \int _{I_2}\pmb {v}'(t) \pmb {X}(t)dt, \end{aligned}$$

where the vector functions \(\pmb {u}(t)\) and \(\pmb {v}(t)\) are called the vector weight functions. The weight functions \(\pmb {u}(t)\) and \(\pmb {v}(t)\) are chosen to maximize the coefficients

$$\begin{aligned} \rho = \frac{{{\mathrm{Cov}}}(U,V)}{\sqrt{{{\mathrm{Var}}}(U) {{\mathrm{Var}}}(V)}} \in (0,1], \end{aligned}$$

subject to the constraint that

$$\begin{aligned} {{\mathrm{Var}}}(U)={{\mathrm{Var}}}(V)=1. \end{aligned}$$
(5)

The coefficient \(\rho \) is called the canonical correlation coefficient. However, simply carrying out this maximization does not produce a meaningful result. The correlation \(\rho \) achieved by the function \(\pmb {u}(t)\) and \(\pmb {v}(t)\) is equal to 1. The canonical variate weight functions \(\pmb {u}(t)\) and \(\pmb {v}(t)\) do not give any meaningful information about the data and clearly demonstrate the need for a technique involving smoothing. A straightforward way of introducing smoothing is to modify the constraints (5) by adding roughness penalty terms to give (Ramsay and Silverman 2005):

$$\begin{aligned} {{\mathrm{Var}}}\left( U^{(N)}\right)= & {} {{\mathrm{Var}}}\left( \int _{I_1}\pmb {u}'(t) \pmb {Y}(t)dt\right) + \lambda {{\mathrm{PEN}}}_2(\pmb {u})=1,\end{aligned}$$
(6)
$$\begin{aligned} {{\mathrm{Var}}}\left( V^{(N)}\right)= & {} {{\mathrm{Var}}}\left( \int _{I_2}\pmb {v}'(t) \pmb {X}(t)dt\right) + \lambda {{\mathrm{PEN}}}_2(\pmb {v})=1, \end{aligned}$$
(7)

where the roughness function \({{\mathrm{PEN}}}_2\) is the integrated squared second derivative

$$\begin{aligned} {{\mathrm{PEN}}}_2(\pmb {u}) = \int _{I_1} \left( \frac{\partial ^2 \pmb {u}(t)}{\partial t^2}\right) ' \frac{\partial ^2 \pmb {u}(t)}{\partial t^2} dt. \end{aligned}$$

Assuming that the vector weight function \(\pmb {u}(t)\) and the process Y(t) are in the same space, i.e. the function \(\pmb {u}(t)\) can be written in the form:

$$\begin{aligned} \pmb {u}(t)=\pmb {\Phi }_1(t)\pmb {\omega } \end{aligned}$$

we have

$$\begin{aligned} {{\mathrm{PEN}}}_2(\pmb {u})&= \int _{I_1} \left( \frac{\partial ^2 \pmb {\Phi }_1(t) \pmb {\omega }}{\partial t^2}\right) ' \frac{\partial ^2 \pmb {\Phi }_1(t) \pmb {\omega }}{\partial t^2} dt\\&= \pmb {\omega }' \int _{I_1} \left( \frac{\partial ^2 \pmb {\Phi }_1(t) }{\partial t^2}\right) ' \frac{\partial ^2 \pmb {\Phi }_1(t)}{\partial t^2} dt\ \pmb {\omega }\\&= \pmb {\omega }' \pmb {R}_1 \pmb {\omega }, \end{aligned}$$

where

$$\begin{aligned} \pmb {R}_1 = \int _{I_1} \left( \frac{\partial ^2 \pmb {\Phi }_1(t) }{\partial t^2}\right) ' \frac{\partial ^2 \pmb {\Phi }_1(t)}{\partial t^2} dt. \end{aligned}$$
(8)

Similarly assuming that

$$\begin{aligned} \pmb {v}(t)=\pmb {\Phi }_2(t)\pmb {\nu } \end{aligned}$$

we can obtain

$$\begin{aligned} {{\mathrm{PEN}}}_2(\pmb {v}) = \pmb {\nu }' \pmb {R}_2 \pmb {\nu }, \end{aligned}$$

where

$$\begin{aligned} \pmb {R}_2 = \int _{I_2} \left( \frac{\partial ^2 \pmb {\Phi }_2(t) }{\partial t^2}\right) ' \frac{\partial ^2 \pmb {\Phi }_2(t)}{\partial t^2} dt. \end{aligned}$$
(9)

Now the first functional canonical correlation \(\rho _1\) and corresponding vector weight functions \(\pmb {u}_1(t)\) and \(\pmb {v}_1(t)\) are defined as

$$\begin{aligned} \rho _1 = \sup _{\pmb {u} \in L_2^p(I_1),\pmb {v} \in L_2^q(I_2)} \frac{{{\mathrm{Cov}}}({<}\pmb {u},\pmb {Y}{>},{<}\pmb {v},\pmb {X}{>})}{\sqrt{{{\mathrm{Var}}}(U^{(N)}) {{\mathrm{Var}}}(V^{(N)})}}, \end{aligned}$$

subject to the constraint that

$$\begin{aligned} {{\mathrm{Var}}}\left( U^{(N)}\right) = {{\mathrm{Var}}}\left( V^{(N)}\right) =1. \end{aligned}$$

In general, the kth functional canonical correlation \(\rho _k\) and the associated vector weight functions \(\pmb {u}_k(t)\) and \(\pmb {v}_k(t)\) are defined as follows:

$$\begin{aligned} \rho _k&=\sup _{\pmb {u} \in L_2^p(I_1),\pmb {v} \in L_2^q(I_2)} {{\mathrm{Cov}}}({<}\pmb {u},\pmb {Y}{>},{<}\pmb {v},\pmb {X}{>})\\&={{\mathrm{Cov}}}({<}\pmb {u_k},\pmb {Y}{>},{<}\pmb {v_k},\pmb {X}{>}), \end{aligned}$$

where \(\pmb {u}_k(t)\) and \(\pmb {v}_k(t)\) are subject to the restrictions (6) and (7), and the kth pair of canonical variables \((U_k,V_k)\) is not correlated with the first \(k-1\) canonical variables, where

$$\begin{aligned} U_k={<}\pmb {u}_k,\pmb {Y}{>},\quad V_k={<}\pmb {v}_k,\pmb {X}{>} \end{aligned}$$

are canonical variables. We refer to this procedure as smoothed canonical correlation analysis. The expression \((\rho _k,\pmb {u}_k(t),\pmb {v}_k(t))\) will be called the kth canonical system of the pair of processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\).

Let

$$\begin{aligned} {{\mathrm{Var}}}(\pmb {\alpha })&={{\mathrm{E}}}(\pmb {\alpha }\pmb {\alpha }') = \pmb {\Sigma }_{11},\\ {{\mathrm{Var}}}(\pmb {\beta })&={{\mathrm{E}}}(\pmb {\beta }\pmb {\beta }') = \pmb {\Sigma }_{22},\\ {{\mathrm{Cov}}}(\pmb {\alpha }, \pmb {\beta })&={{\mathrm{E}}}(\pmb {\alpha }\pmb {\beta }') =\pmb {\Sigma }_{12}. \end{aligned}$$

Let us consider the canonical variables \(U^{*}={<}\pmb {\omega }, \pmb {\alpha }{>}\) and \(V^{*}={<}\pmb {\nu }, \pmb {\beta }{>}\) of random vectors \(\pmb {\alpha }\) and \(\pmb {\beta }\) respectively. The kth canonical correlation \(\gamma _k\) and associated vector weights \(\pmb {\omega }_k\) and \(\pmb {\nu }_k\) are defined as

$$\begin{aligned} \gamma _k=\sup _{\pmb {\omega } \in \mathbb {R}^{K_1+p}, \pmb {\nu } \in \mathbb {R}^{K_2+q}} {{\mathrm{Cov}}}({<}\pmb {\omega },\pmb {\alpha }{>}, {<}\pmb {\nu },\pmb {\beta }{>}) = \pmb {\omega }_k' \pmb {\Sigma }_{12} \pmb {\nu }_k, \end{aligned}$$

subject to the restriction

$$\begin{aligned} \pmb {\omega }_k' (\pmb {\Sigma }_{11} + \lambda \pmb {R}_1) \pmb {\omega }_k&= 1,\\ \pmb {\nu }_k' (\pmb {\Sigma }_{22} + \lambda \pmb {R}_2) \pmb {\nu }_k&= 1, \end{aligned}$$

where \(\pmb {R}_1\) and \(\pmb {R}_2\) are given by (8) and (9) respectively, and the kth canonical variables \((U^{*}_k, V^{*}_k)\) are not correlated with the first \(k-1\) canonical variables. The expression \((\gamma _k,\pmb {\omega }_k, \pmb {\nu }_k)\) will be called the kth canonical system of the random vectors \(\pmb {\alpha }\) and \(\pmb {\beta }\).

Theorem 3

The kth canonical system \((\rho _k,\pmb {u_k}(t),\pmb {v_k}(t))\) of the pair of random processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\) is related to the kth canonical system \((\gamma _k,\pmb {\omega }_k,\pmb {\nu }_k)\) of the pair of the random vectors \(\pmb {\alpha }\) and \(\pmb {\beta }\) by the equations:

$$\begin{aligned} \rho _k = \gamma _k,\ \pmb {u}_k(t)=\pmb {\Phi }_1(t) \pmb {\omega }_k, t \in I_1,\ \pmb {v}_k(t)=\pmb {\Phi }_2(t) \pmb {\nu }_k, t \in I_2, \end{aligned}$$

where \(1\le k \le \min (K_1+p,K_2+q), K_1=E_1+...E_p, K_2=F_1+...+F_q\).

Proof

Without loss of generality we may assume that the covariance matrices \(\pmb {\Sigma }_{11}\) and \(\pmb {\Sigma }_{22}\) are of full column rank. As in the proof of Theorem 1 it may be assumed, that the vector weight function \(\pmb {u}(t)\) and the process \(\pmb {Y}(t)\) are in the same space, i.e. the function \(\pmb {u}(t)\) can be written in the form:

$$\begin{aligned} \pmb {u}(t)= \pmb {\Phi }(t) \pmb {\omega }, \end{aligned}$$

where \(\pmb {\omega } \in \mathbb {R}^{K_1+p}\). Than

$$\begin{aligned} {<} \pmb {u}, \pmb {Y}{>} = \pmb {\omega }' \pmb {\alpha }. \end{aligned}$$

Similarly for \(\pmb {v}\in L_2^q(I_2)\)

$$\begin{aligned} {<} \pmb {v}, \pmb {X}{>} = \pmb {\nu }' \pmb {\beta }, \end{aligned}$$

where \(\pmb {\nu } \in \mathbb {R}^{K_2+q}\). Hence

$$\begin{aligned} {{\mathrm{E}}}({<} \pmb {u}, \pmb {Y}{>})&= \pmb {\omega }' {{\mathrm{E}}}(\pmb {\alpha }) = \pmb {\alpha }' \pmb {0} =0,\\ {{\mathrm{E}}}({<} \pmb {v}, \pmb {X}{>})&= \pmb {\nu }' {{\mathrm{E}}}(\pmb {\beta }) = \pmb {\beta }' \pmb {0} =0,\\ {{\mathrm{Var}}}({<} \pmb {u}, \pmb {Y}{>})&= \pmb {\omega }' \pmb {\Sigma }_{11} \pmb {\omega },\\ {{\mathrm{Var}}}({<} \pmb {v}, \pmb {X}{>})&= \pmb {\nu }' \pmb {\Sigma }_{22} \pmb {\nu },\\ {{\mathrm{Cov}}}({<} \pmb {u}, \pmb {Y}{>}, {<} \pmb {v}, \pmb {X}{>})&= \pmb {\omega }' \pmb {\Sigma }_{12} \pmb {\nu }. \end{aligned}$$

Let us consider the first canonical correlation between the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\):

$$\begin{aligned} \rho _1&=\sup _{\pmb {u} \in L_2^p(I_1),\pmb {v} \in L_2^q(I_2)} {{\mathrm{Cov}}}({<}\pmb {u},\pmb {Y}{>},{<}\pmb {v},\pmb {X}{>})\\&={{\mathrm{Cov}}}({<}\pmb {u}_1,\pmb {Y}{>},{<}\pmb {v}_1,\pmb {X}{>}), \end{aligned}$$

where \(\pmb {u}(t)\) and \(\pmb {v}(t)\) are subject to the restrictions (6) and (7). This is equivalent to saying that

$$\begin{aligned} \gamma _1=\sup _{\pmb {\omega } \in \mathbb {R}^{K_1+p}, \pmb {\nu } \in \mathbb {R}^{K_2+q}} \pmb {\omega }' \pmb {\Sigma }_{12} \pmb {\nu } = \pmb {\omega }_1' \pmb {\Sigma }_{12} \pmb {\nu }_1, \end{aligned}$$

subject to the restriction

$$\begin{aligned} \pmb {\omega }_1' (\pmb {\Sigma }_{11} + \lambda \pmb {R}_1) \pmb {\omega }_1&= 1,\\ \pmb {\nu }_1' (\pmb {\Sigma }_{22} + \lambda \pmb {R}_2) \pmb {\nu }_1&= 1, \end{aligned}$$

where \(\pmb {R}_1\) and \(\pmb {R}_2\) are given by (8) and (9) respectively. This is the definition of the first canonical correlation between the random vectors \(\pmb {\alpha }\) and \(\pmb {\beta }\).

On the other hand, if we begin with the first canonical system \((\gamma _1,\pmb {\omega }_1,\pmb {\nu }_1)\) of the pair of random vectors \(\pmb {\alpha }\) and \(\pmb {\beta }\), we will obtain the first canonical system for the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\) from the equation

$$\begin{aligned} \rho _1 = \gamma _1,\ \pmb {u}_1(t)=\pmb {\Phi }_1(t) \pmb {\omega }_1,\ \pmb {v}_1(t)=\pmb {\Phi }_2(t) \pmb {\nu }_1. \end{aligned}$$

We may extend these considerations to the second canonical system and so on. \(\square \)

Canonical correlation analysis for the random vectors \(\pmb {\alpha }\) and \(\pmb {\beta }\) is based on the matrices \(\pmb {\Sigma }_{11}, \pmb {\Sigma }_{22}\) and \(\pmb {\Sigma }_{12}\), which are unknown. We estimate them on the basis of n independent realizations \(\pmb {y}_1(t),\pmb {y}_2(t),....,\pmb {y}_n(t)\) of the form \(\pmb {y}_i(t)=\pmb {\Phi }_1(t) \hat{\pmb {\alpha }}_i\) of random process \(\pmb {Y}(t)\), and \(\pmb {x}_1(t),\pmb {x}_2(t),....,\pmb {x}_n(t)\) of the form \(\pmb {x}_i(t)=\pmb {\Phi }_2(t) \hat{\pmb {\beta }}_i\) of random process \(\pmb {X}(t)\), \(i=1,2,...,n\), where

$$\begin{aligned} \hat{\pmb {\alpha }}_i&= \big (\hat{\alpha }_{10}^{(i)},...,\hat{\alpha }_{1E_1}^{(i)},...,\hat{\alpha }_{p0}^{(i)},...,\hat{\alpha }_{pE_p}^{(i)}\big )',\\ \hat{\pmb {\beta }}_i&= \big (\hat{\beta }_{10}^{(i)},...,\hat{\beta }_{1F_1}^{(i)},...,\hat{\beta }_{q0}^{(i)},...,\hat{\beta }_{qF_q}^{(i)}\big )'. \end{aligned}$$

Let

$$\begin{aligned} \hat{\pmb {A}}&= (\hat{\pmb {\alpha }}_1,...,\hat{\pmb {\alpha }}_n)',\\ \hat{\pmb {B}}&= (\hat{\pmb {\beta }}_1,...,\hat{\pmb {\beta }}_n)'. \end{aligned}$$

Finally the estimators of the matrices \(\pmb {\Sigma }_{11},\pmb {\Sigma }_{22}\) and \(\pmb {\Sigma }_{12}\) have the form:

$$\begin{aligned} \hat{\pmb {\Sigma }}_{11}&= \frac{1}{n} \hat{\pmb {A}}' \hat{\pmb {A}},\\ \hat{\pmb {\Sigma }}_{22}&= \frac{1}{n} \hat{\pmb {B}}' \hat{\pmb {B}},\\ \hat{\pmb {\Sigma }}_{12}&= \frac{1}{n} \hat{\pmb {A}}' \hat{\pmb {B}}. \end{aligned}$$

Let \(\hat{\pmb {C}} = \hat{\pmb {\Sigma }}_{11}^{-1} \hat{\pmb {\Sigma }}_{12}\) and \(\hat{\pmb {D}} = \hat{\pmb {\Sigma }}_{22}^{-1} \hat{\pmb {\Sigma }}_{21}\), where \(\hat{\pmb {\Sigma }}_{21}=\hat{\pmb {\Sigma }}_{12}'\). The matrices \(\hat{\pmb {C}}\hat{\pmb {D}}\) and \(\hat{\pmb {D}}\hat{\pmb {C}}\) have the same nonzero eigenvalues \(\hat{\gamma }_k^2\), and their corresponding eigenvectors \(\hat{\pmb {\omega }}_k\) and \(\hat{\pmb {\nu }}_k\) are given by the equations:

$$\begin{aligned}&(\hat{\pmb {C}}\hat{\pmb {D}} - \hat{\gamma }_k^2 \pmb {I}_{K_1+p}) \hat{\pmb {\omega }}_k = \pmb {0},\\&(\hat{\pmb {D}}\hat{\pmb {C}} - \hat{\gamma }_k^2 \pmb {I}_{K_2+q}) \hat{\pmb {\nu }}_k = \pmb {0}. \end{aligned}$$

\(1\le k \le \min (K_1+p,K_2+q)\).

Hence the coefficients of the projection of the ith realization \(\pmb {y}_i(t)\) of process \(\pmb {Y}(t)\) on the kth functional canonical variable are equal to

$$\begin{aligned} \hat{U}_{ik}={<}\pmb {\hat{u}}_k,\pmb {y_{i}}{>}=\int _{I_1} \pmb {\hat{u}}_k(t)\pmb {y_{i}}(t)dt=\hat{\pmb {\alpha }}'_i\hat{\pmb {\omega }}_k, \end{aligned}$$

Analogously the coefficients of the projection of the the ith realization \(\pmb {x}_{i}(t)\) of process \(\pmb {X}(t)\) on the kth functional canonical variable are equal to

$$\begin{aligned} \hat{V}_{ik}= \hat{\pmb {\beta }}'_i\hat{\pmb {\nu }}_k, \end{aligned}$$

where \(i=1,2,\ldots ,n\), \(k=1,\ldots ,\min (K_1+p,K_2+q)\).

6 Example

The following data (Fig. 1) come from the online database of the World Bank (http://data.worldbank.org/). For the analysis, fifty-four countries of the world were chosen (\(n=54\)). These were recorded in the years 1972–2009 (\(J = 38\)). Each country belongs to one of four classes (\(L=4\)):

  1. 1.

    Low-income economies (GDP $1,025 or less), \(n_1=3\)

  2. 2.

    Lower-middle-income economies (GDP $1,026 to $4,035), \(n_2=19\)

  3. 3.

    Upper-middle-income economies (GDP $4,036 to $12,475), \(n_3=14\)

  4. 4.

    High-income economies (GDP $12,476 or more), \(n_4=18\)

and was characterized by four variables:

  1. 1.

    \(X_1\): GDP growth (annual %)—Annual percentage growth rate of GDP at market prices based on constant local currency. Aggregates are based on constant 2000 U.S. dollars. GDP is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources.

  2. 2.

    \(X_2\): Energy use (rate of growth in kg of oil equivalent per capita)—Energy use refers to use of primary energy before transformation to other end-use fuels, which is equal to indigenous production plus imports and stock changes, minus exports and fuels supplied to ships and aircraft engaged in international transport.

  3. 3.

    \(X_3\): CO\(_2\) emissions (rate of growth in kt)—Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.

  4. 4.

    \(X_4\): Population in urban agglomerations of more than 1 million (% of total population)—Population in urban agglomerations of more than one million is the percentage of a country’s population living in metropolitan areas that in 2000 had a population of more than one million people.

Fig. 1
figure 1

Data set trajectories

The data were transformed to functional data by the method described in Sect. 2. The calculations were performed using the Fourier basis system what is a typical selection. But others such as splines, polynomials or wavelets can also be used. Optimum values of B, selected using the BIC criterion, for \(X_1, X_2, X_3\) and \(X_4\) take the values 2, 2, 2 and 6 respectively. The time interval \([0,T]=[0,38]\) was divided into moments of time in the following way: \(t_1=0.5 (1972), t_2=1.5 (1973),..., t_{38}=37.5 (2009)\).

We used the R package fda (Ramsay et al. 2009) to create Fourier system and in order to convert raw data into a functional object. Other procedures were implemented by us (Table 1).

6.1 Multivariate functional principal component analysis (MFPCA)

The statistical objects in the functional principal component analysis are 54 countries (\(n=54\)) characterized by four (\(p=4\)) pieces of functional data\(\pmb {x}_i(t)=(x_{i1}(t),x_{i2}(t),x_{i3}(t),x_{i4}(t))', t \in [0,38], i=1,2,...,54\). No account is taken of an objects membership of one of the four defined groups of countries. The vector functions \(\pmb {x}_1(t),\pmb {x}_2(t),...,\pmb {x}_{54}(t)\) have the form

$$\begin{aligned} \pmb {x}_i(t)=\pmb {\Phi }(t) \hat{\pmb {c}}_i, \end{aligned}$$

where \(\pmb {\Phi }(t)\) is a matrix with the form of (3), and the vector \(\hat{\pmb {c}}_i\) has the form

$$\begin{aligned} \hat{\pmb {c}}_i=(\hat{c}_{i10}(t),...,\hat{c}_{i1B_{1}},...,\hat{c}_{i40}(t),...,\hat{c}_{i4B_{4}})', \end{aligned}$$

where \(B_{1}=B_{2}=B_{3}=2, B_{4}=6, i=1,2,...,54\). In the first step, from the vectors \(\hat{\pmb {c}}_1,\hat{\pmb {c}}_2,...,\hat{\pmb {c}}_{54}\) we build the matrix \(\hat{\pmb {\Sigma }}\), and next we find its eigenvalues \(\hat{\gamma }_k\) and the corresponding vectors \(\hat{\pmb {\omega }}_k\). The ratios of the particular eigenvalues to the sum of all eigenvalues, expressed as percentages, are shown in Fig. 2. It can be seen from Fig. 2 that \(94.8\,\%\) of the total variation is accounted for by the first functional principal component. In the second step we form the vector weight functions

$$\begin{aligned} \hat{\pmb {u}}_k(t)=\pmb {\Phi }(t) \hat{\pmb {\omega }}_k, \end{aligned}$$

where \(k=1,...,16\), and the corresponding functional principal components in the form

$$\begin{aligned} \hat{U}_k={<}\hat{\pmb {u}}_k,\pmb {X}{>}. \end{aligned}$$

The graphs of the four components of the vector weight functions for the first and second functional principal components appear in Fig. 3. The values of the coefficients of the vector weight functions corresponding to the first and second functional principal components are given in Table 2. At a given time point t the greater is the absolute value of a component of the vector weight function, the greater is the contribution, in the structure of the given functional principal component, from the process X(t) corresponding to that component. From Fig. 3 (left) it can be seen that the greatest contribution in the structure of the first functional principal component comes from process \(X_4(t)\), and this holds for all of the observation years considered. Figure 3 (right) shows that, on specified time intervals, the greatest contribution in the structure of the second functional principal component comes alternately from the processes \(X_2(t)\) and \(X_1(t)\). The total contribution of a particular original process \(X_i(t)\) in the structure of a given functional principal component is equal to the area under the module weighting function corresponding to this process. These contributions for the four components of the vector process \(\pmb {X}(t)\), and the first and second functional principal components are given in Table 2. The relative positions of the 54 countries in the system of the first two functional principal components are shown in Fig. 4. The system of the first two functional principal components retains \(96.3\,\%\) of the total variation. From Fig. 4, we see that 54 countries form a relatively homogeneous group, with the exception of Singapore (SGP), Korea Rep. (KOR) and China (CHN).

6.2 Multivariate functional discriminant coordinates (MFDCA)

In the construction of functional discriminant coordinates, by contrast with the construction of functional principal components, we take account additionally of the information concerning the division of the 54 countries into four disjoint groups (\(L=4\)). From the vectors \(\hat{\pmb {c}}_i\) we build the estimator \(\hat{\pmb {B}}\) of the matrix of between-class variation, and the estimator \(\hat{\pmb {T}}\) of the matrix of total variation, and then we find the non-zero eigenvalues \(\hat{\gamma }_k\) of the matrix \(\hat{\pmb {T}}^{-1}\hat{\pmb {B}}\) and the corresponding vectors \(\hat{\pmb {\omega }}_k, k=1,2,3\). The ratios of particular eigenvalues to the sum of all eigenvalues, expressed as percentages, are shown in Fig. 5. It can be seen from Fig. 5 that \(74.7\,\%\) of the total variation is accounted for by the first functional discriminant coordinate. In the second step we form the vector weight functions

$$\begin{aligned} \hat{\pmb {u}}_k(t)=\pmb {\Phi }(t) \hat{\pmb {\omega }}_k, \end{aligned}$$

where \(k=1,2,3\), and the corresponding functional discriminant coordinates in the form

$$\begin{aligned} \hat{U}_k={<}\hat{\pmb {u}}_k,\pmb {X}{>}. \end{aligned}$$

The graphs of the four components of the vector weight function for the first and second functional discriminant coordinates appear in Fig. 6. The values of the coefficients of the vector weight functions corresponding to the first and second functional discriminant coordinates are given in Table 3. At a given time point t the greater is the absolute value of a component of the vector weight function, the greater is the contribution, in the structure of the given functional discriminant coordinate, from the process X(t) corresponding to that component. Figure 6 (left) shows that the greatest contribution in the structure of the first and second functional discriminant coordinates comes from process \(X_4(t)\), and this holds for all of the observation years considered. Similarly as in the case of the functional principal components, the total contribution of a particular original process \(X_i(t)\) in the structure of a particular functional discriminant coordinate can be estimated using the area under the module weighting function corresponding to this process. These contributions for the four components of the vector process \(\pmb {X}(t)\) and the first and second functional discriminant coordinates are given in Table 3. The relative positions of the 54 countries in the system of the first two functional discriminant coordinates are shown in Fig. 7. The system of the first two functional discriminant coordinates retains \(93.9\,\%\) of the total variation. Compared with the projection onto the first two functional principal components, the division into four groups is better visible here. State clearly different from the other countries is the Democratic Republic of Kongo (COD). In the group of developed countries, Finland (FIN) is clearly different from other countries.

6.3 Multivariate functional canonical analysis (MFCCA)

In the construction of functional canonical variables we do not take account of the division of the 54 countries into four groups, and we divide the four-dimensional stochastic process into two parts: \(\pmb {Y}(t)=(X_2(t),X_3(t))'\) and \(\pmb {X}(t)=(X_1(t),X_4(t))'\). In our case \(p=q=2\). We are interested in the relationship between the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\). We build estimators of the matrices \(\pmb {\Sigma }_{11},\pmb {\Sigma }_{22}\) and \(\pmb {\Sigma }_{12}\), and we then find the non-zero eigenvalues \(\hat{\gamma }^2_k\) and corresponding vectors \(\hat{\pmb {\omega }}_k\) of the matrix \(\hat{\pmb {C}}\hat{\pmb {D}}\), and the eigenvalues \(\hat{\gamma }^2_k\) and corresponding vectors \(\hat{\pmb {\nu }}_k\) of the matrix \(\hat{\pmb {D}}\hat{\pmb {C}}\), where \(\hat{\pmb {C}} = \hat{\pmb {\Sigma }}_{11}^{-1} \hat{\pmb {\Sigma }}_{12}\) and \(\hat{\pmb {D}} = \hat{\pmb {\Sigma }}_{22}^{-1} \hat{\pmb {\Sigma }}_{21}, \hat{\pmb {\Sigma }}_{21}=\hat{\pmb {\Sigma }}_{12}', k=1,...,6\). The eigenvalues \(\hat{\gamma }_k\), called canonical correlations, are shown in Fig. 8. In the second step we form vector weight functions

$$\begin{aligned} \hat{\pmb {u}}_k(t)=\pmb {\Phi }_1(t) \hat{\pmb {\omega }}_k,\ \hat{\pmb {v}}_k(t)=\pmb {\Phi }_2(t) \hat{\pmb {\nu }}_k, \end{aligned}$$

corresponding to the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\), where \(k=1,...,6.\). Corresponding to these functions are the functional canonical variables in the form

$$\begin{aligned} \hat{U}_k={<}\hat{\pmb {u}}_k,\pmb {Y}{>},\ \hat{V}_k={<}\hat{\pmb {v}}_k,\pmb {X}{>}, \end{aligned}$$

corresponding to the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\). The graphs of the two components of the vector weight function for the first and second functional canonical variables of the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\) are shown in Figs. 910. Table 4 contains the values of the coefficients of the vector weight functions, together with the total contribution from each process in the structure of the corresponding functional canonical variable. The relative positions of the 54 countries in the systems \((\hat{U}_1,\hat{V}_1)\) of functional canonical variables are shown in Fig. 11. The strong correlation (\(\rho _1=0.951\)) between the processes \(\pmb {X}(t)\) and \(\pmb {Y}(t)\) means that in the system of canonical variables \((\hat{U}_1,\hat{V}_1)\) points representing individual countries are almost on a straight line. In terms of the correlation between the processes \(\pmb {X}(t)\) and \(\pmb {Y}(t)\), 54 countries form a relatively homogeneous group with the exception of Singapore (SGP) and Saudi Arabia (SAU).

7 Conclusions and future work

This paper introduces and analyzes a new methods of constructing canonical variables and discriminant coordinates for multivariate functional data. In addition, we reviewed principal components analysis for such data (Jacques and Preda 2014). FDA is an important tool that can be used for exploratory data analysis. A primary advantage is the ability do assess continuous data without reducing the signal into discrete variables. By representing each curve as a function, it is possible to use functional analogue of classical methods. Functional methods (1) allow more complex dynamics than classical methods; (2) they utilize a nonparametric smoothing technique to reduce the observational error; and (3) they solve the inverse and multicollinearity problems caused by the ”curse of dimensionality”.

Proposed methods was applied to geographic economic multivariate time series. Our research has shown, on this example, that the use of a multivariate projective dimension reduction techniques gives good results and provide an attractive method for flexibly analyse such data. Of course, the performance of the algorithms needs to be further evaluated on additional real and artificial data sets.

In a similar way, we would like to extend manifold dimension reduction techniques like multidimensional scaling (Borg and Groenen 2005), isometric feature mapping (Tenenbaum et al. 2000) or maximum variance unfolding (Weiss 1999) for univariate functional data to multivariate case. This is the direction of our future research.