Selected statistical methods of data analysis for multivariate functional data

Data in the form of a continuous vector function on a given interval are referred to as multivariate functional data. These data are treated as realizations of multivariate random processes. The paper is devoted to three statistical dimension reduction techniques for multivariate data. For the first one, principal components analysis, the authors present a review of a recent paper (Jacques and Preda in, Comput Stat DataAnal, 71:92–106, 2014). For two others one, canonical variables and discriminant coordinates, the authors extend existing works for univariate functional data to multivariate. These methods for multivariate functional data are presented, illustrated and discussed in the context of analyzing real data sets. Each of these techniques is applied on real data set.


Introduction
In recent years, methods for representing data by functions or curves have received much attention. Such data are known in the literature as functional data (Bongiorno et al. 2014;Ferraty and Vieu 2006;Horváth and Kokoszka 2012;Ramsay and Silverman 2005). Examples of functional data can be found in various application domains, such as medicine, economics, meteorology and many others. In previous papers on functional data analysis, objects are characterized only by one feature observed at B Tomasz Górecki tomasz.gorecki@amu.edu.pl 1 Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Umultowska 87, 61-614 Poznań, Poland many time points. Methods of functional data analysis are becoming increasingly popular, e.g. in the cluster analysis (Jacques and Preda 2013;James and Sugar 2003;Peng and Müller 2008), classification (Chamroukhi et al. 2013;Delaigle and Hall 2012;Mosler and Mozharovskyi 2015;Rossi and Villa 2006) and regression (Ferraty et al. 2012;Goia and Vieu 2014;Kudraszow and Vieu 2013;Peng et al. 2015;Rachdi and Vieu 2006;Wang et al. 2015). Unfortunately, multivariate data methods cannot be directly used for functional data, because of the problem of dimensionality and difficulty in putting functional data into order. In many applications there is a need to use statistical methods for objects characterized by many features observed at many time points (double multivariate data), and such data are called multivariate functional data. A pioneering theoretical works were Besse (1979), Besse and Ramsay (1986), where random variables take values in a general Hilbert space. Saporta (1981) presents an analysis of multivariate functional data from the point of view of factorial methods (principal components and canonical analysis). Berrenderoa et al. (2011), Jacques and Preda (2014) and Panaretos et al. (2010) discussed principal component analysis for multivariate functional data (MFPCA). In this paper we extend the construction of principal component analysis to other projective dimension reduction techniques, i.e. discriminant coordinates and canonical correlation analysis for multivariate functional data.
Dimension reduction is a very active field of statistical research. We focused only on projective dimension reduction techniques (Burges 2009): principal components analysis (PCA), canonical correlation analysis (CCA) and Fisher discriminant analysis (DCA). These procedures are a transformation that allows us to obtain a linear projection of our data, originally in R p onto R k , where k < p. Along with reducing the data dimensions, the data are also projected in a different orientation. Altogether, this transformation presents the data in a manner that stresses out the trends in it facilitating its interpretation.
The rest of this paper is organized as follows. We first review the concept of transformation of discrete data to multivariate functional data (Sect. 2). Section 3 contains a review principal components analysis for multivariate functional data. Sections 4 and 5 contain our extension of existing works for univariate functional data to multivariate, respectively for CCA and discriminant coordinates. Section 6 contains the results of our experiments on the real data set. Conclusions are given in Sect. 7.

Transformation of discrete data to multivariate functional data
Let X (t) be a stochastic process with continuous parameter t ∈ I . Moreover, assume that X ∈ L 2 (I ), where L 2 (I ) is a Hilbert space of square integrable functions on the interval I and that the process X (t) has the following representation: where {ϕ b } are orthonormal basis functions, and c 0 , c 1 , . . . , c B are the random coefficients.
Many financial, meteorological and other data are recorded at discrete moments in time. Let x j denote an observed value of process X (t) at the jth time point t j , where I is a compact set such that t j ∈ I , for j = 1, ..., J . Then our data consist of J pairs (t j , x j ). This discrete data can be smoothed by continuous function x(t), where t ∈ I (Ramsay and Silverman 2005).
Let (1) is estimated by the least squares method, that is, so as to minimize the function: Differentiating S(c c c) with respect to the vector c c c, we obtain the least squares method estimatorĉ Then The degree of smoothness of the function x(t) depends on the value B (a small value of B causes more smoothing of the curves). The optimum value for B is selected using the Bayesian information criterion (BIC): We decided to use such criterion because Akaike Information Criterion (AIC) better measures predictive accuracy while BIC better measures goodness of fit (Berk 2008;Shmueli 2010;Sober 2002).
Let us assume that there are n independent pairs of values (t i j , x i j ), j = 1, ..., J , i = 1, ..., n. These discrete data are smoothed to continuous functions in the following form: Among all the B 1 , B 2 , ..., B n one common value of B is chosen, as the modal value of the numbers B 1 , B 2 , ..., B n .
The set of functions {x 1 (t), ..., x n (t) : t ∈ I } obtained in this way is called functional data (see Ramsay and Silverman 2005). Note that in some cases it could be interesting to discretize smooth functions. This is at the center of variable selection methods in functional data analysis (Aneiros and Vieu 2014).
So far we have been dealing with data characterized by one feature. Our considerations can be generalized to the case of p ≥ 2 features. Then our data consist of n independent vector functions x x : t ∈ I } will be called multivariate functional data. Multivariate functional data can conveniently be treated as realizations of a finite multidimensional stochastic process X X X (t) = (X 1 (t), X 2 (t), ..., X p (t)) with continuous parameter t ∈ I . We will further assume, that X X X ∈ L p 2 (I ), where L 2 (I ) is a Hilbert space of square integrable functions on the interval I equipped with the following inner product: We consider the case, where the dth component of process X X X (t) can be represented by a finite number of orthonormal basis functions where c db are random variables. Let c c c = (c 10 , ..., c 1B 1 , ..., c p0 , ..., c pB p ) ,

Principal component analysis for multivariate functional data
The idea of principal component analysis is to reduce the dimensionality of a data set consisting of a large number of correlated variables, while retaining as much as possible of the variation present in the data set. This is achieved by transforming to a new set of variables, the principal components, which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all of the original variables. Suppose that we are observing a p-dimensional random vector X X X = (X 1 , X 2 , ..., X p ) ∈ R p . In the first step we look for a linear combination U 1 = u 11 X 1 + u 12 X 2 + ... + u 1 p X p = u u u 1 X X X of the elements of vector X X X having maximum variance. The variable U 1 is called the first principal component. Next, we look for a linear combination U 2 = u u u 2 X X X , uncorrelated with the first principal component U 1 , having maximum variance, and so on, so that at the kth stage a linear combination U k = u u u k X X X , called the kth principal component, is found that has maximum variance subject to being uncorrelated with the first k − 1 principal components (Jolliffe 2002). The observations can be presented graphically as points on a plane (U 1 , U 2 ). The functional case of PCA (FPCA) is a more informative way of looking at the variability structure in the variance-covariance operator for one dimensional functional data (Górecki and Krzyśko 2012). In this section we present PCA for multivariate functional data (Jacques and Preda 2014). Without loss of generality we will further assume, that E(X X X ) = 0 0 0. In principal component analysis in the multivariate functional case, we are interested in finding the inner product having maximal variance for all u u u ∈ L p 2 (I ) such, that <u u u, u u u> = 1. Let where <u u u 1 , u u u 1 > = 1. The inner product U 1 = <u u u 1 , X X X > will be called the first principal component, and the vector function u u u 1 will be called the first vector weight function. Subsequently we look for the second principal component U 2 = <u u u 2 , X X X >, which maximizes Var(<u u u, X X X >), is such that <u u u 2 , u u u 2 > = 1, and is not correlated with the first functional principal component U 1 , i.e. is subject to the restriction <u u u 1 , u u u 2 > = 0.
In general, the kth functional principal component U k = <u u u k , X X X > satisfies the conditions: The expression (λ k , u u u k (t)) will be called the kth principal system of the process X X X (t).
In sect. 2 we have shown, that the process X X X (t) can be represented as X X X where κ 1 , κ 2 = 1, ..., k, K = B 1 + ... + B p . The expression (γ k , ω ω ω k ) will be called the kth principal system of vector c c c.
Determining the kth principal system of vector c c c is equivalent to solving for the eigenvalue and corresponding eigenvectors of the covariance matrix of that vector, standardized so that ω ω ω κ 1 ω ω ω κ 2 = δ κ 1 κ 2 .
Theorem 1 The kth principal system (λ k , u u u k (t)) of the stochastic process X X X (t) is related to the kth principal system (γ k , ω ω ω k ) of the random vector c c c by the equations: where k = 1, ..., s and s = rank( ).
Proof It may be assumed (Ramsay and Silverman 2005), that the vector weight function u u u(t) and the process X X X (t) are in the same space, i.e. the function u u u(t) can be written in the form: where ω ω ω ∈ R K + p . Then Let us consider the first functional principal component of process X X X (t): where <u u u 1 , u u u 1 > = 1. This is equivalent to saying that where ω ω ω 1 ω ω ω 1 = 1. This is the definition of the first principal component of the random vector c c c. On the other hand, if we begin with the first principal system of the random vector c c c defined by (γ 1 , ω ω ω 1 ), we will obtain the first principal system for the process X X X (t) from the equations We may extend these considerations to the second principal system and so on.
Moreover the kth principal system of the random process X X X (t) determined from a sample has the following form: Hence the coefficients of the projection of the ith realization x x x i (t) of process X X X (t) on the kth functional principal component are equal to: Finally the coefficients of the projection of the ith realization x x x i (t) of the process X X X (t) on the plane of the first two functional principal components from the sample are equal to (ω ω ω 1ĉ c c i ,ω ω ω 2ĉ c c i ), i = 1, 2, ..., n.

Discriminant coordinates for multivariate functional data
Now let us consider the case where the samples originate from L groups. We would often like to present them graphically, to see their configuration or to eliminate outlying observations. However it may be difficult to produce such a presentation even if only three features are observed. A different method must therefore be sought for presenting multidimensional data originating from multiple groups. To make the task easier, in the first step every p-dimensional observation X X X = (X 1 , X 2 , ..., X p ) ∈ R p can be transformed into a one-dimensional observation U 1 = u 11 X 1 + u 12 X 2 + ... + u 1 p X p = u u u 1 X X X , and the resulting one-dimensional observations can be presented graphically as points on a straight line. In the second step we can define a second linear combination U 2 = u u u 2 X X X not correlated with the first, and present the observations graphically as points on a plane (U 1 , U 2 ). The space of discriminant coordinates is a space convenient for the use of various classification methods (methods of discriminant analysis). When L = 2 we obtain only one discriminant coordinate, coinciding with the well-known Fisher's linear discriminant function (Fisher 1936). The functional case of discriminant coordinates analysis (FDCA) and its kernel variant (KFDCA) are also well known (Górecki et al. 2014). In this section we propose FDCA for multivariate functional data (MFDCA). Let x x x l1 (t), x x x l2 (t), ..., x x x ln l (t) be n l independent realizations of a p-dimensional stochastic process X X X (t) belonging to the lth class, where l = 1, 2, . . . , L. Our purpose is to construct the discriminant coordinate based on multivariate functional data, i.e. to construct such that their between-class variance is maximal compared with the total variance, More precisely, the first functional discriminant coordinate U 1 = <u u u 1 , X X X > is defined as subject to the constraint where Var B (<u u u, X X X >) and Var T (<u u u, X X X >) are respectively the between-class and total variance of discriminant coordinate U 1 . Condition (4) ensures the uniqueness of the first discriminant coordinate U 1 .
Similarly we can construct the kth functional discriminant coordinate where the vector weight function u u u k (t) is defined as subject to the constraint Moreover the kth discriminant coordinate U k is not correlated with the first k − 1 discriminant coordinates. The expression (λ k , u u u k (t)) will be called the kth discriminant system of the process X X X (t).
Let us recall that the process X X X (t) can be represented as X X X (t) = (t)c c c, t ∈ I . Now let us consider the discriminant coordinates of the random vector c c c. The kth discriminant coordinate U * k = <ω ω ω k , c c c> of this vector satisfies the condition: subject to the restriction The expression (γ k , ω ω ω k ) will be called the kth discriminant system of the random vector c c c.
Theorem 2 The kth discriminant system (λ k , u u u k (t)) of the stochastic process X X X (t) is related to the kth discriminant system (γ k , ω ω ω k ) of the random vector c c c by the equations: Proof We assume, that the vector weight function u u u(t) and the process X X X (t) are in the same space, i.e. the function u u u(t) can be written in the form: Hence the between-class variance of the inner product <u u u, X X X > is and the total variance where Var B (c c c) and Var T (c c c) are respectively the matrices of sum of squares and products of between-class and total variance. For the first functional discriminant coordinate of the process X X X (t) we have: where ω ω ω 1 Var T (c)ω ω ω 1 = 1. This is equivalent to where ω ω ω 1 ω ω ω 1 = 1. This meets the definition of the first discriminant coordinate of the random vector c c c. On the other hand, if the first discriminant system (γ 1 , ω ω ω 1 ) defines the first discriminant coordinate of the random vector c c c, we will obtain the first discriminant system for the process X X X (t) from the equations We may extend these considerations to the second discriminant system and so on.
The matrices Var B (c c c) and Var T (c c c) are unknown and must be estimated based on the sample. Let n lc c c lc c c l , where n = L l=1 n l . Next we find non-zero eigenvaluesγ 1 ≥γ 2 ≥ ... ≥γ s and the corresponding eigenvectorsω . Furthermore the kth discriminant system of the random process X X X (t) has the following form: Hence the coefficients of the projection of the ith realization x x x li (t) of process X X X (t) belonging to the lth class on the kth functional discriminant component are equal to:

Canonical correlation analysis for multivariate functional data
Suppose now that we are observing two random vectors Y Y Y = (Y 1 , Y 2 , ..., Y p ) ∈ R p and X X X = (X 1 , X 2 , ..., X q ) ∈ R q and looking for the relationship between them. This is the one of the main problems of canonical correlation analysis. We search for weight vectors u u u ∈ R p and v v v ∈ R q , such that the linear combinations called the first pair of canonical variables, are maximally correlated.
Canonical correlation analysis has been extended to the case of multivariate time series (Brillinger 2001), under the assumption of stationarity, and extension of canonical correlation to functional data has been proposed in Leurgans et al. (1993), where the need for regularization was pointed out. He et al. (2000) showed that random processes with finite basis expansion have simple canonical structures, analogously to the case of random vectors. This motivates to implement regularization by projecting random processes on a finite number of basis function. The idea to project processes on the finite k basis functions has been discussed in He et al. (2004). This projection is on a prespecified orhonormal basis. In this Section, we consider the canonical correlations for the multivariate functional data. The proposed method is a generalization of the method presented in He et al. (2004). Other generalization is presented by Dubin and Müller (2005).
Let Y Y Y (t) and X X X (t) are stochastic processes. We will further assume that Y Y Y ∈ L p 2 (I 1 ), X X X ∈ L q 2 (I 2 ) and each component Y g (t) of process Y Y Y (t) and X h (t) of process X X X (t) can be represented by a finite number of orthonormal basis functions {ϕ e } and {ϕ f } respectively: Moreover let E(Y Y Y ) = 0 0 0, E(X X X ) = 0 0 0. This fact does not cause loss of generality, because functional canonical variables are calculated based on the covariance functions of processes Y Y Y (t) and X X X (t).
We introduce the following notation: where ϕ ϕ ϕ E 1 , ..., ϕ ϕ ϕ E p and ϕ ϕ ϕ F 1 , ..., ϕ ϕ ϕ F q are orthonormal basis functions of space L 2 (I 1 ) and L 2 (I 2 ), respectively, and K 1 = E 1 + E 2 + ... + E p , K 2 = F 1 + F 2 + ... + F q . Using the above matrix notation the processes Y Y Y (t) and X X X (t) can be represented as: Functional canonical variables U and V for stochastic processes Y Y Y (t) and X X X (t) are defined as follows where the vector functions u u u(t) and v v v(t) are called the vector weight functions. The weight functions u u u(t) and v v v(t) are chosen to maximize the coefficients subject to the constraint that The coefficient ρ is called the canonical correlation coefficient. However, simply carrying out this maximization does not produce a meaningful result. The correlation ρ achieved by the function u u u(t) and v v v(t) is equal to 1. The canonical variate weight functions u u u(t) and v v v(t) do not give any meaningful information about the data and clearly demonstrate the need for a technique involving smoothing. A straightforward way of introducing smoothing is to modify the constraints (5) by adding roughness penalty terms to give (Ramsay and Silverman 2005): where the roughness function PEN 2 is the integrated squared second derivative Assuming that the vector weight function u u u(t) and the process Y (t) are in the same space, i.e. the function u u u(t) can be written in the form: Similarly assuming that v v v(t) = 2 (t)ν ν ν we can obtain Now the first functional canonical correlation ρ 1 and corresponding vector weight functions u u u 1 (t) and v v v 1 (t) are defined as , subject to the constraint that In general, the kth functional canonical correlation ρ k and the associated vector weight functions u u u k (t) and v v v k (t) are defined as follows: where u u u k (t) and v v v k (t) are subject to the restrictions (6) and (7), and the kth pair of canonical variables (U k , V k ) is not correlated with the first k − 1 canonical variables, where are canonical variables. We refer to this procedure as smoothed canonical correlation analysis. The expression (ρ k , u u u k (t), v v v k (t)) will be called the kth canonical system of the pair of processes Y Y Y (t) and X X X (t). Let Var(α α α) = E(α α αα α α ) = 11 , Let us consider the canonical variables U * = <ω ω ω, α α α> and V * = <ν ν ν, β β β> of random vectors α α α and β β β respectively. The kth canonical correlation γ k and associated vector weights ω ω ω k and ν ν ν k are defined as Cov(<ω ω ω, α α α>, <ν ν ν, β β β>) = ω ω ω k 12 ν ν ν k , subject to the restriction where R R R 1 and R R R 2 are given by (8) and (9) respectively, and the kth canonical variables (U * k , V * k ) are not correlated with the first k − 1 canonical variables. The expression (γ k , ω ω ω k , ν ν ν k ) will be called the kth canonical system of the random vectors α α α and β β β.

Theorem 3 The kth canonical system
) of the pair of random processes Y Y Y (t) and X X X (t) is related to the kth canonical system (γ k , ω ω ω k , ν ν ν k ) of the pair of the random vectors α α α and β β β by the equations: Proof Without loss of generality we may assume that the covariance matrices 11 and 22 are of full column rank. As in the proof of Theorem 1 it may be assumed, that the vector weight function u u u(t) and the process Y Y Y (t) are in the same space, i.e. the function u u u(t) can be written in the form: Let us consider the first canonical correlation between the processes Y Y Y (t) and X X X (t): where u u u(t) and v v v(t) are subject to the restrictions (6) and (7). This is equivalent to saying that subject to the restriction where R R R 1 and R R R 2 are given by (8) and (9) respectively. This is the definition of the first canonical correlation between the random vectors α α α and β β β.
On the other hand, if we begin with the first canonical system (γ 1 , ω ω ω 1 , ν ν ν 1 ) of the pair of random vectors α α α and β β β, we will obtain the first canonical system for the processes Y Y Y (t) and X X X (t) from the equation We may extend these considerations to the second canonical system and so on.
Canonical correlation analysis for the random vectors α α α and β β β is based on the matrices 11 , 22 and 12 , which are unknown. We estimate them on the basis of n independent realizations y y y 1 (t), y y y 2 (t), ...., y y y n (t) of the form y y y i (t) Finally the estimators of the matrices 11 , 22 and 12 have the form: 22ˆ 21 , whereˆ 21 =ˆ 12 . The matricesĈ C CD D D andD D DĈ C C have the same nonzero eigenvaluesγ 2 k , and their corresponding eigenvectorsω ω ω k and ν ν ν k are given by the equations: Hence the coefficients of the projection of the ith realization y y y i (t) of process Y Y Y (t) on the kth functional canonical variable are equal tô Analogously the coefficients of the projection of the the ith realization x x x i (t) of process X X X (t) on the kth functional canonical variable are equal tô where i = 1, 2, . . . , n, k = 1, . . . , min(K 1 + p, K 2 + q).

Example
The following data (Fig. 1) 1972 1977 1982 1987 1992 1997 2002 2007 1972 1977 1982 1987 1992 1997 2002 2007 1972 1977 1982 1987 1992 1997 2002 2007 Upper−middle−income High−income Lower−middle−income Low−income 20 40 60 80 100 1972 1977 1982 1987 1992 1997 2002 2007 Upper−middle−income High−income Lower−middle−income Low−income Fig. 1 Data set trajectories U.S. dollars. GDP is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. 2. X 2 : Energy use (rate of growth in kg of oil equivalent per capita)-Energy use refers to use of primary energy before transformation to other end-use fuels, which is equal to indigenous production plus imports and stock changes, minus exports and fuels supplied to ships and aircraft engaged in international transport. 3. X 3 : CO 2 emissions (rate of growth in kt)-Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring. 4. X 4 : Population in urban agglomerations of more than 1 million (% of total population)-Population in urban agglomerations of more than one million is the percentage of a country's population living in metropolitan areas that in 2000 had a population of more than one million people.

Population
The data were transformed to functional data by the method described in Sect. 2. The calculations were performed using the Fourier basis system what is a typical selection. But others such as splines, polynomials or wavelets can also be used. Optimum values of B, selected using the BIC criterion, for X 1 , X 2 , X 3 and X 4 take the values 2, 2, 2 and 6 respectively. The time interval [0, T ] = [0, 38] was divided into moments of time in the following way: t 1 = 0.5(1972), t 2 = 1.5(1973), ..., t 38 = 37.5(2009).
We used the R package fda (Ramsay et al. 2009) to create Fourier system and in order to convert raw data into a functional object. Other procedures were implemented by us (Table 1).

Multivariate functional principal component analysis (MFPCA)
The statistical objects in the functional principal component analysis are 54 countries (n = 54) characterized by four ( p = 4) pieces of functional data , t ∈ [0, 38], i = 1, 2, ..., 54. No account is taken of an objects membership of one of the four defined groups of countries. The vector functions x x x 1 (t), x x x 2 (t), ..., x x x 54 (t) have the form where (t) is a matrix with the form of (3), and the vectorĉ c c i has the form where B 1 = B 2 = B 3 = 2, B 4 = 6, i = 1, 2, ..., 54. In the first step, from the vectorsĉ c c 1 ,ĉ c c 2 , ...,ĉ c c 54 we build the matrixˆ , and next we find its eigenvaluesγ k and the corresponding vectorsω ω ω k . The ratios of the particular eigenvalues to the sum of all eigenvalues, expressed as percentages, are shown in Fig. 2. It can be seen from Fig. 2 that 94.8 % of the total variation is accounted for by the first functional principal component. In the second step we form the vector weight functionŝ where k = 1, ..., 16, and the corresponding functional principal components in the formÛ k = <û u u k , X X X >.
The graphs of the four components of the vector weight functions for the first and second functional principal components appear in Fig. 3. The values of the coefficients of the vector weight functions corresponding to the first and second functional principal components are given in Table 2. At a given time point t the greater is the absolute value of a component of the vector weight function, the greater is the contribution, in the structure of the given functional principal component, from the process X (t) corresponding to that component. From Fig. 3 (left) it can be seen that the greatest contribution in the structure of the first functional principal component comes from process X 4 (t), and this holds for all of the observation years considered. Figure 3 (right) shows that, on specified time intervals, the greatest contribution in the structure of the second functional principal component comes alternately from the processes X 2 (t) and X 1 (t). The total contribution of a particular original process X i (t) in the structure of a given functional principal component is equal to the area under the module weighting function corresponding to this process. These contributions for the four components of the vector process X X X (t), and the first and second functional principal components are given in Table 2. The relative positions of the 54 countries in the system of the first two functional principal components are shown in Fig. 4. The system of the first two functional principal components retains 96.3 % of the total variation. From Fig. 4, we see that 54 countries form a relatively homogeneous group, with the exception of Singapore (SGP), Korea Rep. (KOR) and China (CHN).

Multivariate functional discriminant coordinates (MFDCA)
In the construction of functional discriminant coordinates, by contrast with the construction of functional principal components, we take account additionally of the information concerning the division of the 54 countries into four disjoint groups (L = 4). From the vectorsĉ c c i we build the estimatorB B B of the matrix of betweenclass variation, and the estimatorT T T of the matrix of total variation, and then we find the non-zero eigenvaluesγ k of the matrixT T T −1B B B and the corresponding vectorŝ ω ω ω k , k = 1, 2, 3. The ratios of particular eigenvalues to the sum of all eigenvalues, expressed as percentages, are shown in Fig. 5. It can be seen from Fig. 5 that 74.7 % of the total variation is accounted for by the first functional discriminant coordinate. In the second step we form the vector weight functionŝ where k = 1, 2, 3, and the corresponding functional discriminant coordinates in the formÛ The graphs of the four components of the vector weight function for the first and second functional discriminant coordinates appear in Fig. 6. The values of the coefficients of the vector weight functions corresponding to the first and second functional discriminant coordinates are given in Table 3. At a given time point t the greater is the absolute value of a component of the vector weight function, the greater is the contribution, in the structure of the given functional discriminant coordinate, from the process X (t) corresponding to that component. Figure 6 (left) shows that the greatest contribution in the structure of the first and second functional discriminant coordinates comes from process X 4 (t), and this holds for all of the observation years considered. Similarly as in the case of the functional principal components, the total contribution of a particular original process X i (t) in the structure of a particular functional discriminant coordinate can be estimated using the area under the module weighting function corresponding to this process. These contributions for the four components of the vector process X X X (t) and the first and second functional discriminant coordinates are given in Table 3. The relative positions of the 54 countries in the system of the first two functional discriminant coordinates are shown in Fig. 7. The system of the first two functional discriminant coordinates retains 93.9 % of the total variation. Compared with the projection onto the first two functional principal components, the division into four groups is better visible here. State clearly different from the other countries is the Democratic Republic of Kongo (COD). In the group of developed countries, Finland (FIN) is clearly different from other countries.

Multivariate functional canonical analysis (MFCCA)
In the construction of functional canonical variables we do not take account of the division of the 54 countries into four groups, and we divide the four-dimensional stochastic process into two parts: Y Y Y (t) = (X 2 (t), X 3 (t)) and X X X (t) = (X 1 (t), X 4 (t)) .
In our case p = q = 2. We are interested in the relationship between the processes Y Y Y (t) and X X X (t). We build estimators of the matrices 11 , 22 and 12 , and we then find the non-zero eigenvaluesγ 2 k and corresponding vectorsω ω ω k of the matrixĈ C CD D D, and the eigenvaluesγ 2 k and corresponding vectorsν ν ν k of the matrixD D DĈ C C, whereĈ C C =ˆ −1 11ˆ 12 andD D D =ˆ −1 22ˆ 21 ,ˆ 21 =ˆ 12 , k = 1, ..., 6. The eigenvaluesγ k , called canonical correlations, are shown in Fig. 8. In the second step we form vector weight functionŝ corresponding to the processes Y Y Y (t) and X X X (t), where k = 1, ..., 6.. Corresponding to these functions are the functional canonical variables in the form corresponding to the processes Y Y Y (t) and X X X (t). The graphs of the two components of the vector weight function for the first and second functional canonical variables of the processes Y Y Y (t) and X X X (t) are shown in Figs. 9, 10. Table 4 contains the values of the coefficients of the vector weight functions, together with the total contribution from each process in the structure of the corresponding functional canonical variable. The relative positions of the 54 countries in the systems (Û 1 ,V 1 ) of functional canonical variables are shown in Fig. 11. The strong correlation (ρ 1 = 0.951) between the processes X X X (t) and Y Y Y (t) means that in the system of canonical variables (Û 1 ,V 1 ) points representing individual countries are almost on a straight line. In terms of the correlation between the processes X X X (t) and Y Y Y (t), 54 countries form a relatively homogeneous group with the exception of Singapore (SGP) and Saudi Arabia (SAU).

Conclusions and future work
This paper introduces and analyzes a new methods of constructing canonical variables and discriminant coordinates for multivariate functional data. In addition, we reviewed principal components analysis for such data (Jacques and Preda 2014). FDA is an important tool that can be used for exploratory data analysis. A primary advantage is the ability do assess continuous data without reducing the signal into discrete variables. By representing each curve as a function, it is possible to use functional analogue of classical methods. Functional methods (1) allow more complex dynamics than classical methods; (2) they utilize a nonparametric smoothing technique to reduce the observational error; and (3) they solve the inverse and multicollinearity problems caused by the "curse of dimensionality".
Proposed methods was applied to geographic economic multivariate time series. Our research has shown, on this example, that the use of a multivariate projective dimension reduction techniques gives good results and provide an attractive method for flexibly analyse such data. Of course, the performance of the algorithms needs to be further evaluated on additional real and artificial data sets.
In a similar way, we would like to extend manifold dimension reduction techniques like multidimensional scaling (Borg and Groenen 2005), isometric feature mapping (Tenenbaum et al. 2000) or maximum variance unfolding (Weiss 1999) for univariate functional data to multivariate case. This is the direction of our future research.