Functional principal component analysis for incomplete space-time data

Environmental signals, acquired, e.g., by remote sensing, often present large gaps of missing observations in space and time. In this work, we present an innovative approach to identify the main variability patterns, in space-time data, when data may be affected by complex missing data structures. We formalise the problem in the framework of Functional Data Analysis, proposing an innovative method of functional Principal Component Analysis (fPCA) for incomplete space-time data. The functional nature of the proposed method permits to borrow information from measurements observed at nearby spatio-temporal locations. The resulting functional principal components are smooth fields over the considered spatio-temporal domain, and can lead to interesting insights in the spatio-temporal dynamic of the phenomenon under study. Moreover, they can be used to provide a reconstruction of the missing entries, also under severe missing data patterns. The proposed model combines a weighted rank-one approximation of the data matrix with a roughness penalty. We show that the estimation problem can be solved using a Majorize-Minimization approach, and we provide a numerically efficient algorithm for its solution. Thanks to a discretization based on finite elements in space and B-splines in time, the proposed method can handle multidimensional spatial domains with complex shapes, such as water bodies with complicated shorelines, or curved spatial regions with complex orography. As shown by simulation studies, the proposed space-time fPCA is superior to alternative techniques for Principal Component Analysis with missing data. We further highlight the potentiality of the proposed method for environmental problems, by applying space-time fPCA to the study of the Lake Water Surface Temperature (


Introduction
In environmental and ecological sciences, it is of fundamental interest to analyze signals acquired across space and time, using remote sensing or other measuring devices.However, such signals are often only partially observed, over the spatio-temporal domain, and may present complex missing data patterns.Air pollution datasets, for instance, often display a high percentage of missing values, due to faults in the measuring devices.Satellite remote sensing data, that can be used to explore vegetation indices or surface temperature over lands, seas or lakes, are often affected by large gaps, in space and time, which might be caused, e.g., by ice coverage, presence of clouds or other meteorological conditions.Figure 1 offers an example in this sense.We here display the spatio-temporal profile of the water surface temperature of Lake Victoria, in Central Africa.These data, analyzed, for example, by Gong et al. (2018) and Gong et al. (2021), are provided by the ARC-Lake database (see, e.g., MacCallum and Merchant, 2011).Although the data consist of monthly averaged measurements, data may be missing for many consecutive months, on large portions of the lake.
In this work, we are interested in investigating the main patterns of variability in spatio-temporal signals, which may be affected by complex missing data structures, such as those highlighted above.We do so in the framework of Principal Component Analysis (PCA).In this respect, it should be noted that the presence of missing data may challenge or invalidate standard approaches to PCA.For this reason, alternative strategies have been explored in the literature, to perform PCA with missing data, relying on iterative procedures, which combine PCA with missing data imputation.These iterative PCA techniques are motivated by the results of, e.g.Gabriel and Zamir (1979) and Kiers (1997), in the context of a weighted low-rank approximation.For example, the Data INterpolating Empirical Orthogonal Function (DINEOF) method (Beckers and Rixen, 2003) updates its reconstruction of the missing entries by Singular Value Decomposition on the imputed data, until convergence.An analogous technique is proposed by Josse et al. (2011) and Josse and Husson (2012), who describe a regularized iterative PCA algorithm, which reduces the possibility of overfit.These approaches are extensively employed in environmental and ecological applications, where satellite remote sensing data are of interest (see, e.g., Hilborn and Costa, 2018;Wang and Liu, 2014;Alvera-Azcárate et al., 2007, 2005).DINEOF is arguably the most popular approach in these fields.However, these techniques work on a multivariate assumption, and do not take advantage of the spatio-temporal nature of the phenomena under study.2 Fig. 1: Monthly averaged satellite measurements of Lake Water Surface Temperature (LWST), Lake Victoria, Central Africa.Top left: map of Lake Victoria.Top right and center: LWST in the months of August 1996, May 2003, and March 2011.Bottom: LWST at the 5 spatial locations in the lake, indicated by color markers in the map in the top-left panel.
Here, we propose an innovative PCA method for incomplete spatio-temporal signals.To appropriately borrow information from measurements observed at nearby spatio-temporal locations, we formalize the problem of PCA in a Functional Data Analysis framework (Ramsay and Silverman, 2005;Ferraty, 2006;Kokoszka and Reimherr, 2017).Functional PCA approaches for space-time data are considered, for instance, in Li and Guan (2014), where a method based on Poisson maximum likelihood is proposed to provide an estimation of the covariance function for the spatio-temporal data generation process, from which principal components are extracted by means of an eigenvalue decomposition.In Liu et al. (2017) the authors develop a technique for the functional principal component analysis of spatially correlated functional data, which is then used as curve reconstruction method in the context of partially observed functional data.
Our proposal of functional Principal Component Analysis (fPCA) originates from a different literature, based on penalized rank-one approximations of the data matrix (see, e.g., Huang et al., 2008).In particular, we consider an estimation functional that combines a weighted rank-one approximation of the data matrix with roughness penalties based on partial differential operators over space and time.The obtained functional principal components are smooth fields over the considered spatio-temporal domain.They are easy to interpret and can lead to interesting insights in the spatiotemporal dynamic of the phenomenon under study.Moreover, they can be used to provide a reconstruction of the missing entries, also under severe missing data patterns.
To minimize the considered fPCA estimation functional, we develop an appropriate Majorization-Minimization algorithm (see, e.g., Lange, 2016).This approach is used in a variety of statistical methods, such as multidimensional scaling (Heiser, 1987a) and correspondence analysis (Heiser, 1987b).In particular, the popular Expectation-Maximization method, widely used in all areas of statistics, is a special case of the Majorization-Minimization approach (Lange and Zhou, 2022).An interesting property of these optimization approaches is that they guarantee convergence to a local optimum (Wu, 1983).For the proposed fPCA problem, we show that the estimation functional of fPCA with incomplete space-time data can be majorized by the estimation functional of fPCA with fully observed space-time data.Moreover, the latter estimation problem can be seen as an extension to space-time settings of the fPCA approaches considered by Lila et al. (2016) and Arnone et al. (2023) over spaceonly domains.We discretize the estimation problem using B-splines over the temporal domain, and finite elements defined over a triangulation of the spatial domain.This enables the methodology to deal with data observed over spatial domains with complex shapes, such as water bodies with complicated shorelines, or curved spatial regions with complex orography.The proposed fPCA is a new addition to the class of Physics-Informed Spatial and Functional Data Analysis methods, reviewed in Sangalli (2021), and is implemented in the R/C++ library fdaPDE (Arnone et al., 2023).
Simulation studies demonstrate the good performances of the proposed fPCA for incomplete space-time data, and its advantages over state-of-the-art techniques for PCA with missing data.These simulation studies consider different incomplete data settings, including sparse data and data with large gaps in space and time, as in the case of the considered application to the study of water surface temperature of Lake Victoria.
The paper is organized as follow.Section 2 introduces the proposed fPCA for incomplete spatio-temporal data.Section 3 describes the discretization of the estimation problem.Section 4 details the Majorize-Minimization algorithm for the minimization of the proposed estimation functional.Section 5 reports the simulation studies, that compare the proposal to popular approaches for PCA with missing data.Section 6 illustrates the application to the study of the surface water temperature of Lake Victoria.Some concluding remarks are drawn in Section 7. All proofs are deferred to A and B.

Mathematical framework
In this section we introduce the fPCA problem for incomplete space-time data.In Section 2.1 we give the theoretical definition of functional principal component analysis, for a random field defined over a spatio-temporal domain.In Section 2.2 we introduce the fPCA estimation problem for incomplete space-time data.

Functional Principal Component Analysis of a random field on a spatio-temporal domain
Let D be a bounded and possibly non-convex subset of R d , with d = 2, 3, and let T ⊂ R be a time interval.Introduce the space of square-integrable functions on the spatio-temporal domain and assume it has a finite second moment, i.e., D×T E[X 2 ] < ∞, and a square integrable covariance function K (p 1 , t 1 ), (p 2 , t 2 ) .Define the covariance operator V : Thanks to Mercer's Lemma (Riesz and Nagy, 2012), there exists an orthonormal sequence {f k } k of eigenfunctions and a nonincreasing sequence {ξ k } k of eigenvalues such that the following eigenvalue problem holds , where {s [k] } k is a sequence of zero-mean uncorrelated random variables, with s [k] = D×T (X − µ)(p, t)f k (p, t).The functions {f k } k are named Principal Component (PC) functions, whereas the random variables {s [k] } k are named Principal Component scores.PC functions {f k } k identify the strongest modes of variation in the random field X .In fact, it can be shown that f 1 is such that and subsequent PCs f k , with k > 1, solve the same problem with the additional constraint ⟨f k , f h ⟩ L 2 (D×T ) = 0, for h = 1, . . ., k − 1, i.e., f k must be orthogonal to all the previous PCs.
Another characterization of the PCs goes under the name of best M -basis approximation property: for any positive integer M , the first M PCs satisfy where δ kh denotes the Kronecker delta, i.e., δ kh = 1 if and only if k = h, and δ kh = 0 otherwise.

fPCA estimation problem for incomplete space-time data
Assume L realizations x 1 , x 2 , . . ., x L of the random field X were available, observed continuously over the spatio-temporal domain D × T , and without noise.We could then compute the sample covariance operator V and obtain the first M PCs from its (numerical) spectral decomposition.
In real-world applications, however, we never observe realizations of the random field X continuously over D ×T and without noise, but only their noisy measurements, at some spatio-temporal locations.Specifically, let p 1 , . . ., p n , be n locations in the spatial domain D, and t 1 , . . ., t m , be m time instants in T .Denote by x lij the noisy measurement of the l-th statistical unit x l at the spatio-temporal location (p i , t j ).In the Functional Data Analysis community, a common approach to solve the fPCA problem, starting from the noisy and discrete measurements of the statistical units, consists in first obtaining smooth representatives of x 1 , x 2 , . . ., x L , by appropriate smoothing procedures, and then computing the resulting sample covariance operator, with its spectral decomposition.However, such a pre-smoothing approach may fail in the context of missing data, especially in the presence of complex missing data patterns, as highlighted in Palummo et al. (2023).
We here follow a different approach that starts from the characterization of the PCs given in Equation (1).Specifically, for l = 1, . . ., L, let O l be the set of all index pairs (i, j) where x lij is not missing.Assume, for simplicity of exposition, that the data have already been centered around the mean, at each spatio-temporal location.Then, the sample version of the objective functional in Equation (1), for M = 1, is given by (2) The estimation of the infinite-dimensional PC function f , starting from the discrete measurements x lij , through minimization of (2), is though an ill-posed problem, unless a proper regularization is introduced.To this end, we add to the objective functional (2) a proper regularizing term, which seeks smoothness in the PC function f .In particular, following the approach presented by Bernardi et al. (2017) in the context of spatio-temporal smoothing, we consider the space-time roughness penalty where ∆f = denotes the Laplacian of f , while λ D > 0 and λ T > 0 are two smoothness parameters, which control the roughness of the PC function in space and time.Therefore, we propose to estimate the first PC function f 1 : D × T → R and the corresponding PC scores vector s 1 ∈ R L by solving the following minimization problem arg min where H ⊂ L 2 (D × T ) is a proper space of functions where the objective functional is well-posed (see, e.g., Arnone, 2018).As discussed in Section 4.1, the first term of the objective functional in Equation ( 4) corresponds to a weighted rank-one approximation of the data matrix, and encourages the PC function f to capture the strongest mode of variation in the observed data.The second term controls the smoothness of f in space and time.Moreover, the term s ⊤ s is justified by invariance considerations on the objective functional, similar to what was done in Huang et al. (2008), while the normalization constraint ∥s∥ 2 = 1 is set to make the representation unique.Subsequent PCs are estimated by solving the same estimation problem, but where missing entries have been imputed, as detailed in Section 4.2.
The estimation problem (4) presents various challenging aspects.First of all, it is an infinite-dimensional estimation problem, involving the infinite-dimensional unknown f , and it does not enjoy a closed form solution.This calls for an appropriate numerical discretization that will be presented in Section 3. Second, it is non-convex in (s, f ).This requires the development of an appropriate iterative scheme, that will be detailed in Section 4. In this respect, the presence of missing data raises another complication.Indeed, the iterative approaches formerly considered for fPCA for fully observed data by, e.g., Zou et al. (2006); Huang et al. (2008); Lila et al. (2016); Arnone et al. (2023), and explored in Stefanucci et al. (2018) in the context of partially observed functional data, may instead be inadequate in the presence of complex missing data patterns, which requires the more complex iterative scheme proposed in Section 4.

Discretization of the infinite dimensional problem
We here present a numerical discretization of the functional in Equation ( 4) which allow us to consider spatial domains with complex shapes, such as water bodies with complicated shorelines or curved spatial regions with complex orography.This discretisation is based on finite elements in space and splines in time.

Spatio-temporal basis system
Let T be a triangulation of the spatial domain of interest D, i.e., a finite union of nonoverlapping triangles approximating D (Hjelle and Daehlen, 2006).The left panel of Figure 2 shows an example.We define, on such triangulation, the space of finite element functions V r T (D) as the space of continuous functions which are polynomials of degree r, once restricted to any triangle in T ; see, e.g.Ciarlet (2002); Quarteroni (2017).For simplicity, in this work, we consider the case of linear finite elements (r = 1).Let ξ 1 , ξ 2 , . . ., ξ N D be the nodes of the triangulation that, for linear elements, coincide with the vertices of the triangles in T .Therefore, we can introduce a basis system ψ 1 (p), ψ 2 (p), . . ., ψ N D (p) for V r T (D), where each basis function is Lagrangian, that is, ψ k (ξ h ) = 1 if and only if k = h, and ψ k (ξ h ) = 0 otherwise.The left panel of Figure 2 shows a linear finite element basis, defined over a triangulation of Lake Victoria.Let ψ = (ψ 1 , ψ 2 , . . ., ψ N D ) ⊤ be the N D -vector of finite element basis.Any function v ∈ V r T (D) can be written as a finite linear combination of these basis, i.e., v(p) = v ⊤ ψ(p).

An interesting property of Lagrangian elements is
For the time dimension, we adopt a set of N T cubic B-spline basis functions ϕ 1 (t), ϕ 2 (t), . . ., ϕ N T (t), defined over the time interval T ; see, e.g.De Boor (1978).The right panel of Figure 2 shows such a basis system.
Finally, we represent the spatio-temporal PC function f (p, t) over D × T as where {c kh } kh are the coefficients of the expansion of f with respect to the considered spatio-temporal basis.

Discretization of the differential penalty
Define the n × N D matrix Ψ = [Ψ] ik = ψ k (p i ) of the evaluation of the N D finite element basis at the n spatial locations, and the Moreover, define the m × N T matrix Φ = [Φ] jk = ϕ k (t j ) of the evaluation of the N T spline basis functions at the m temporal locations, and the N T × N T matrices Let ⊗ be the Kronecker tensor product between matrices, and set Ψ = Ψ ⊗ Φ.Moreover, introduce the vector c = (c 11 , c 12 , . . ., c . Following the approach in Bernardi et al. (2017), we can discretize the differential penalty in Equation ( 3) by 4 Iterative solution of the estimation problem In this section, we propose an iterative procedure, in the family of Majorize-Minimization (MM) algorithms (see, e.g., Lange, 2004;Lange and Zhou, 2022), which permits the efficient numerical solution of the estimation problem (4), and hence the identification of the principal component, and corresponding scores, from a set of partially observed space-time data.

Data loss
Before introducing the MM algorithm that solves Equation (4), we highlight that the first term in (4), i.e., the data loss term, can be interpreted as a weighted rank-one approximation of the data matrix.To this end, let X be the L × nm data matrix, whose l-th row contains the noisy measurements of the l-th statistical units, at the nm spatio-temporal locations, i.e., (x l11 , x l12 , . . ., x lnm ).Denote by W the binary matrix L × nm, whose l-th row (w l11 , w l12 , . . ., w lnm ) has w lij = 1 if and only if (i, j) ∈ O l , that is, when the datum x lij is observed and w lij = 0 otherwise.Denote instead by W C the binary matrix L × nm with zeros indicating observed values and ones indicating missing observations.Finally, let ∥ • ∥ F be the Frobenious norm, and * be the Hadamard (or element-wise) product between matrices.Then, the data loss term in Equation ( 4) can be written as where s = (s 1 , s 2 , . . ., s L ) ∈ R L is the scores vector, and f nm = (f (p 1 , t 1 ), f (p 2 , t 1 ), . .., f (p n , t m )) ⊤ ∈ R nm is the vector of the evaluations of the PC function f at the nm spatio-temporal data locations, i.e., f nm = Ψc.The formulation on the right-hand side of ( 7) is not uncommon in multivariate analysis, where the associated minimization problem is usually formalized as an approximation problem of the data matrix X by another matrix of lower rank (see, e.g., Gabriel and Zamir, 1979).For the unweighted case, i.e., in the special case where the data are fully observed and W is a matrix of all ones, the Eckart-Young-Mirsky theorem (Eckart and Young, 1936) guarantees that the best rank-M matrix, with M ≥ 1, approximating X is provided by its Singular Value Decomposition (SVD).For a general weight matrix W , even if binary, there is no analytic solution, and the problem is solved resorting to iterative methods, such as, for instance, Non-linear Iterative Partial Least Squares (NIPALS) (Wold, 1966) or Criss-Cross regression (Gabriel and Zamir, 1979).

Majorize-Minimization scheme
Thanks to Equations ( 6) and ( 7), we can rewrite the objective functional in (4) as where Tr[•] denotes the matrix trace operator.Now, define the L × N D N T matrix U = sc ⊤ .Noting that sf ⊤ nm = U Ψ⊤ , we can further rewrite (8) as a functional of U as At this point, we show that ( 9) can be minimized in U by an appropriate Majorize-Minimization (MM) scheme.An MM procedure seeks to minimize an objective function h : U → R, where U denotes some parameter space, by iterative minimization of a simpler function, whose optimization is more computationally tractable.In particular, starting from an initial guess U 0 , an MM algorithm builds a sequence U 1 , U 2 , . . .U s in U, which converges to a local optimum of the objective functional h(•) (see, e.g.Wu, 1983).For each iteration index s ≥ 0, U s+1 is selected as the minimizer of a function g(U |U s ), which is taken to be a majorization of h(•) at U s , that is, such that g(U |U s ) ≥ h(U ) for all U ̸ = U s , with the additional condition h(U s ) = g(U s |U s ).Under this update rule, an MM procedure forces h(U ) to decrease, as we have The next result shows that a majorizing functional for the objective h(U ) in ( 9) corresponds to the estimation functional of a fPCA on completely observed space-time data.
Proposition 1.The functional in Equation ( 9) is majorized by where ζ ∈ R is a constant not depending on U , and Y s = W * X + W C * U s Ψ⊤ is the data matrix obtained by imputing the missing observations in X with the reconstructed signal U s , provided by the PCs estimated at the s-th iteration.
The proof is reported in Appendix A. According to Proposition 1, at each iteration of the MM procedure, we have to solve the following minimization problem.arg min This is an fPCA problem for completely observed spatio-temporal functional data, which extends, to the space-time setting, the models presented in Lila et al. (2016) and Arnone et al. (2023).Section 4.3 details the numerical algorithm to solve (11) and extract the first M functional principal components from the imputed data Y s .
Once the PCs of Y s are estimated, the reconstructed signal U s is updated, and the minimization problem ( 11) is repeated in Y s+1 , until convergence.
In the fPCA approaches for fully observed data, described in Lila et al. (2016); Arnone et al. (2023); Huang et al. (2008), the first M principal components are computed one at a time, solving problems similar to (11), on residualized data matrices, as detailed in Section 4.3, without any need for the MM algorithm proposed here.However, the missing data scenario here considered is much more challenging.Indeed, in this setting, we have to recursively apply the MM procedure, and repeat the estimation of all the first M PCs, at each iteration of the MM algorithm, from the imputed data Y s , using as a starting point U 0 the reconstructed signal U s , obtained at convergence of the previous M − 1 PCs.This recursive procedure permits to improve the quality of the M -th estimated PC, and the overall signal reconstruction, while preserving the same quality of estimation on the first M − 1 PCs.

Minimization of the majorizing functional
We solve the estimation problem (11) by an iterative two-step algorithm, where we alternate the estimation of s given c, and the estimation of c given s.This iterative scheme is based on the following results.Proposition 2. (Estimation of s given c).Given c ∈ R N D N T , there exists a unique estimator ŝ ∈ R L , with ∥ŝ∥ 2 = 1, which solves (11).Moreover, Proposition 3. (Estimation of c given s).Given s ∈ R L , with ∥s∥ 2 = 1, there exists a unique estimator ĉ ∈ R N D N T which solves (11).Moreover, The proofs of Propositions 2 and 3 are reported in Appendix B. Proposition 2 highlights that the problem of estimating the scoring vector ŝ is equivalent to that of finding the scores, given the loadings, in multivariate PCA.Proposition 3 shows that the problem of estimating ĉ, given the scores, corresponds to the problem of estimating a smooth field, starting from noisy observations at the spatio-temporal data locations (p i , t j ).
It is worth noting that the problem ( 11) is nonconvex in (s, f ).However, Proposition 2 states the uniqueness of the minimizer ŝ, given c, while Proposition 3 guarantees the uniqueness of the minimizer ĉ, given s.This implies that the objective in Equation ( 11) is non-decreasing under the update rule of the proposed algorithm.
Subsequent principal components of the complete data matrix Y s are computed by solving the estimation problem in Equation ( 11) on the residual matrix Y s − sf ⊤ nm .

Selection of the optimal smoothing parameters
The presence of the pair of smoothing parameters, λ D and λ T , in the penalty term, allows for a further degree of flexibility of the modeling, as we can select the optimal level of smoothing, in space and time, of the PC functions.The accurate selection of these smoothing parameters is crucial to achieve optimal results.Too high values of the smoothing parameters can lead to oversmoothed solutions, leaving meaningful patterns in the residuals.In contrast, a too low value causes the estimated PCs to also fit the noise.We select the optimal pair of smoothing parameters by K-fold cross validation.Specifically, let O be the set {(i, j, l) : (i, j) ∈ O l , l = 1, . . ., L}, i.e., the set of all index triplets for which x lij is observed.We random permute the set O, and partition it in K non-overlapping folds.For each k = 1, . . ., K, let O k be the k-th fold, and define the L × nm binary matrix W −k as the matrix having, on the l-th row, w lij = 1 if and only if (i, j, l) ∈ O \ O k , and w lij = 0 otherwise.Therefore, we define the training set X −k as the matrix W −k * X.Similarly, letting W k be the L × nm binary matrix having, on the l-th row, w lij = 1 if and only if (i, j, l) ∈ O k , and w lij = 0 otherwise, we define the validation set X k as W k * X.Finally, for each pair of smoothing parameters, we calculate the scores matrix S k = [s k 1 , . . ., s k M ] and the loadings matrix ) nm ] on the training set X −k , and we select the pair of parameters (λ D , λ T ) that minimizes the reconstruction error over the validation set X k , averaged over the K folds: As commented by Hastie et al. (2009) in more classical settings, a too high value of the number of folds K might lead to an approximately unbiased estimate for the reconstruction error, having high variance.On the contrary, a too low value of K might lead to an estimated error with low variance, but high bias.To avoid these two opposite suboptimal solutions, in this work we set K = 10, following the general recommendation in Hastie et al. (2009).

Selection of the optimal number of principal components
Determining the appropriate number of principal components that characterize the data is a critical aspect of Principal Component Analysis, when used for dimensionality reduction.In this work, we select the number of principal components on the basis of the total explained variance, following a standard elbow approach.Specifically, we use the notion of adjusted total variance of the computed principal components, proposed by Zou et al. (2006), and detailed in Lila et al. (2016) for the modeling framework here considered.

Simulation studies
In this section, we assess the performances of the proposed fPCA approach for incomplete functional data, compared to other methods for PCA in presence of missing observations, and under different missing data settings.

Data generation
We consider the spatio-temporal domain D × T , with D = [0, 1] 2 , and T = [0, 1].We consider 3 orthonormal cosinusoidal functions f 1 (p, t), f 2 (p, t), f 3 (p, t) on D × T of the form cos(απp 1 ) cos(βπp 2 ) cos(γπt), for p = (p 1 , p 2 ) ⊤ ∈ D and t ∈ T , where (α, β, γ) are set to (1, 1, 2), (1, 3, 2) and (4, 2, 3) for f 1 , f 2 and f 3 , respectively.Based on these functions, which shall play the role of the principal components, we generate L = 50 fields as 3 and σ 3 = 0.2.The L functions are evaluated on a regular grid of 15×15 points over the spatial domain [0, 1] 2 and at 15 equidistant time points over the temporal domain [0, 1].Finally, data are obtained adding to each x l (p i , t j ) uncorrelated Gaussian errors ϵ l (p i , t j ) with zero mean and standard deviation equal to 10% of the range of the data, obtaining the following noisy and discrete measurements of the L statistical units, for l = 1, . . ., L, i = 1, . . ., n, and j = 1, . . ., m.This leads to the L×nm complete data matrix X.To simulate the presence of missing observations, we consider two different missing data settings: an independent censoring in space and time, obtained as in the censoring scheme (a) of Arnone et al. (2022); a dependent censoring in space and time, in which data might be missing for large regions of the spatio-temporal domain, obtained as in the censoring scheme (d) of Arnone et al. (2022).Figures 3 shows the profile of a sparsely observed space-time signal, resulting from an independent censoring in space and time.Figure 4 instead depicts the same data subject to dependent censoring in space and time.Data generation is repeated 100 times, sampling different scores and errors, and simulating different missing data profiles.The proportion of missing observation is set to 50% for both the considered missing data scenarios.

Competing methods
We compare the following competing methods for the Principal Component Analysis of incomplete data.
• PPCA: the Probabilistic PCA approach (Tipping and Bishop, 1999), as implemented by the R package pcaMethods (Stacklies et al., 2007), followed by a multivariate PCA.• DINEOF: the Data INterpolating Empirical Orthogonal Function (DINEOF) method (Beckers and Rixen, 2003), as implemented in the R package sinkr (Taylor, 2022), followed by a multivariate PCA.• IPPCA: the multivariate Iterative Penalized PCA approach (Josse and Husson, 2012), provided by the R package missMDA (Josse and Husson, 2016), followed by a multivariate PCA.• fPCA: the proposed fPCA approach for incomplete functional data, implemented in the R package fdaPDE (Arnone et al., 2023).We use as nodes of the triangulation the 15 × 15 grid of observations, and consider 15 equidistant nodes over the time domain [0, 1].The optimal pair of smoothing parameters is selected using the K-fold cross validation approach detailed in Section 4.4.

Performance measures
Let ( f nm ) k and ŝk be the estimates of the k-th PC and corresponding scores.To compare the quality of the estimates produced by the considered methods, we consider the Root Mean Square Error (RMSE) of the PCs, evaluated at the spatio-temporal data locations, i.e.,

RMSE((
where fkij denotes the evaluation of the estimated k-th PC ( f nm ) k , at the spatiotemporal location (p i , t j ).Moreover, we compute the RMSE on the scores as 1 We also evaluate the performances of the methods in reconstructing the original data.Let Ŝ = [ŝ 1 , ŝ2 , . . ., ŝM ] and Fnm = [( f nm ) 1 , ( f nm ) 2 , . . ., ( f nm ) M ] be the computed scores and loadings matrices, and define the reconstructed signal as X = X + Ŝ F ⊤ nm .We define the signal reconstruction error as 1

√
Lnm ∥X − X∥ F , where X is the matrix of fully observed data.
Finally, we assess the accuracy of the methods in reconstructing the space spanned by the principal components, considering the principal angle between the space spanned by the true principal components and the estimated ones, computed with the command subspace of the R package pracma (Borchers, 2022).

Results
We here compare the results obtained by the various methods using M = 3 principal components.The data are indeed generated using 3 orthonormal functions, as detailed in Section 5.1.Moreover, all the considered methods correctly select M = 3 components, following an elbow analysis of the total explained variance.Figure 5  visualizations of the true first PC, and its estimates provided by PPCA, DINEOF, IPPCA and the proposed fPCA, in the most challenging scenario of dependent censoring in space and time, as illustrated in Figure 4. Figure C1 in Appendix C reports instead the estimates provided by the various methods in the independent censoring scenario.The panels in the center and right column display the spatial profile of the estimated first PC, at a fixed time step.We observe that all methodologies are able to capture the overall spatial profile of the true principal component, with fPCA producing the smoothest results.The bottom-left panel on the same figure shows instead the temporal profile of the true and estimated first PC, at a fixed spatial location.We note that the proposed fPCA is able to correctly follow the smooth behavior of the true PC function, while PPCA, DINEOF, and IPPCA, which do not account for any temporal correlation in the data, produce an irregular and less accurate estimate.Figures 6 and 7 show the boxplots of the considered performance measures, for the estimates obtained by the various competing methods, across the 100 simulation repetitions.The boxplots show that the proposed fPCA performs equally or better than the competitors, along all performance measures.In particular, it produces the best results in terms of signal reconstruction as well as space reconstruction, both for sparsely observed data and for data presenting large gaps in space-time.In particular, in the simulation considered with sparsely observed data, the signal reconstruction error of the proposed approach is, on average, 83% smaller than that of PPCA and 35% smaller than that of DINEOF and IPPCA.For the most challenging scenario of spacetime dependent censoring, fPCA was able to reduce the average reconstruction error of 28% when compared to IPPCA and DINEOF, and to reduce the space reconstruction error of approximately 45%.We point out that we here reported the performances in the signal reconstruction over the whole space-time grid.However, the ordering of the 6 Application to Lake Victoria satellite data Lakes possess a remarkable ability to stabilize short-term temperature fluctuations, while accentuating long-term variations.Notably, the Lake Surface Water Temperature (LSWT) is internationally acknowledged as an Essential Climate Variable (ECV), serving as a surrogate indicator for climate change and representing an important indicator of lake hydrology and biogeochemistry.Furthermore, ecologists are also interested in understanding the spatial and temporal patterns of LSWT, to gain deeper insights into the dynamics of the environmental system.
With the aim of establishing a comprehensive long-term record of lakes' physical conditions, several satellite temperature data products have been developed.Among these, the ARC-Lake database (MacCallum and Merchant, 2011) offers an array of satellite-derived LSWT datasets, encompassing thousands of lakes worldwide.ARC-Lake is a project funded by the European Space Agency (ESA) and developed by the Earth Observation and Space Research Division at the University of Reading.The ESA's Earth Observing missions, including the series of (Advanced) Along Track Scanning Radiometers, (A)ATSRs, hold the potential to serve as highly accurate sources of information concerning LSWTs on a systematic global scale.However, due to technical challenges and specific meteorological conditions, such as the presence of atmospheric clouds, ice covering, and snow, the recorded data contain a significant number of missing observations.Accurate LSWT reconstructions are crucial in many applied fields, such as climate monitoring and numerical weather prediction.Concerning the latter, for instance, the increasing spatial resolution of weather simulations implies that it is no longer possible to neglect the presence of inland water bodies, nor it suffices to provide coarse approximations of their behavior.
In this section, we apply the proposed fPCA model to investigate the main spatiotemporal patterns in the surface temperature of Lake Victoria, starting from the noisy and incomplete observations shown in Figure 1, considering its complex shoreline and the spatio-temporal correlation in the data.Using the estimated smooth principal components, we can provide a reconstruction of the temperature field over the lake, which results to be more accurate than those provided by other signal reconstruction techniques.

Lake Surface Temperature data
We consider the monthly averaged LSWT of Lake Victoria, in Central Africa.Specifically, each datum is an average of the surface temperature over a lake patch of 0.05 • latitude by 0.05 • longitude, and considering a month of observations (daytime).The resulting value is assigned to the center of the pixel, resulting in a grid of n = 2180 equidistant spatial locations over the lake surface.The observation period goes from January 1996 to December 2011, for a total of 202 observations per spatial location (one per month).The considered data display a proportion of missing data equal to 45.2%, with non-trivial observation patterns, as highlighted in Figure 1.
In order to prepare the spatio-temporal water surface temperature data for the fPCA model, we split the data in L = 16 statistical units, one per each calendar year of observations.Because, fixed a calendar year and a spatial location, we observe its temperature once per month, we have m = 12 time instants per statistical unit.This leads us to an L × nm data matrix X, in which each row corresponds to a calendar year.It should be point out that, in the given grid of observations, there are several spatio-temporal locations (p i , t j ) for which there is no observation available, for any statistical units (i.e., calendar year).These correspond to columns of the data matrix X with all missing entries.As discussed in Section 6.2, this setting is particularly challenging and invalidates some of the available state-of-the-art methods.
We point out that, due to the limited temporal range of the considered data, no long-term changes over the years are evident in this dataset.Consequently, we can treat each calendar year as an independent realization of the same spatio-temporal random field X (p, t), so that the assumptions of the proposed fPCA are met.In presence of long-term trends, it is instead necessary to first detrend the data, to obtain signals exhibiting seasonal behaviors, where each recurrence of the yearly pattern can be regarded as an independent realization of the same spatio-temporal random field.The proposed fPCA can thus be applied to the detrended data.Before fPCA can be applied, the data matrix X must first be centered around its mean.If the noise in the measurements is low, the underlying spatio-temporal signal is smooth, and the data matrix X has no columns with all entries missing, then a simple point-wise mean is sufficient.However, in presence of very noisy data, with strong local variability, and for a data matrix X having columns with all entries missing, it is convenient to compute a smooth mean.This avoids removing much of the data variability in the mean, and bypasses the problem raised by spatio-temporal locations (p i , t j ) for which there is no observation available, for any statistical units.To compute a smooth estimate of the mean spatio-temporal temperature field, we use the smoothing method described in (Arnone et al., 2022) and implemented in the R package fdaPDE (Arnone et al., 2023).The estimate is obtained using the triangulation shown in the left panel of Figure 2, and forcing high values for the smoothing parameters.We then extract the first 8 PCs, from the centred data, using the proposed fPCA approach.To do so, we use again the triangulation shown in the left panel of Figure 2 996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Representation on first 3 PCs Observed data we select the optimal level of smoothness by K-fold cross validation, as detailed in Section 4.4, using K = 10.We then select M = 3 PCs, since the plot of explained variance, given in the left panel of Figure 8, shows a clear elbow in correspondence of M = 3, with the first 3 PCs accounting for 60% of the total variation in the data, and the remaining PCs contributing less than 6% each.Figure 9  It should be pointed out that, among the state-of-the-arts competitors presented in Section 5.2, only DINEOF is applicable for the considered data.Indeed, the R packages pcaMethods and missMDA, which implements respectively PPCA and IPPCA, do not consent to analyze data matrices X having at least one column without any observed entry.Subsequently, we contrast the proposed functional Principal Component Analysis (fPCA) with DINEOF, executed through the R package sinkr (Taylor, 2022).It is noteworthy that sinkr autonomously determines the most suitable number of Empirical Orthogonal Functions.Through a K-fold cross-validation with K = 10 folds, the average signal reconstruction error for DINEOF is 0.38, whereas for the proposed fPCA, is 0.29, reflecting a notable reduction in error by an average of 25%.

Conclusions
We have presented an innovative method functional Principal Component Analysis of incomplete space-time data.The functional nature of the proposed method makes it able to borrow information from measurements observed at nearby spatio-temporal locations.This permits an accurate identification of the main variability patterns, in space and time, also when data are sparsely observed or in presence of large spatiotemporal gaps in the signals.The simulation studies in Section 5 demonstrate the comparative advantages of the proposed method over state-of-the-art PCA techniques for data with missing values, in both missing data scenarios.In particular, these studies highlight the superiority of the proposed fPCA in terms of signal reconstruction and space reconstruction.The ability to accurately reconstruct signals is a highly valuable feature in environmental applications, where signals are often affected by large observational gaps in space and time, as illustrated by the application to the study of LSWT of Lake Victoria.

Fig. 2 :
Fig. 2: Left: linear finite element basis function ψ(p) defined over a triangulation of the Lake Victoria.Right: cubic B-spline basis functions {ϕ k (t)} k over a time interval T .

Fig. 3 :
Fig.3: Independent censoring in space and time.Left: visualization in space of one sampled signal, at 4 time instants.Right: visualization in time of the same signal, at 5 spatial locations.Data are sparsely observed over the spatio-temporal domain.

Fig. 4 :
Fig. 4: Dependent censoring in space and time.Left: visualization in space of one sampled signal, at 4 time instants.Right: visualization in time of the same signal, at 5 spatial locations.Data display large gaps in space and time.

Fig. 5 :
Fig. 5: Dependent censoring in space and time, first PC.Top-left: spatial profile of the true first PC, at a fixed time step.Center and right panels: spatial profile of the estimated first PC, at the same time step.Bottom-left: temporal profile of the true and estimated first PC, at a fixed spatial location.

Fig. 6 :
Fig. 6: Independent censoring in space and time: boxplots of the errors for Probabilistic PCA (PPCA), Data INterpolating Empirical Orthogonal Function (DINEOF), multivariate Iterative Penalized PCA (IPPCA) and the proposed functional PCA (fPCA).Top: RMSE on the three PCs.Center: RMSE on scores of the three PCs.Bottom: signal reconstruction error and space reconstruction error.

Fig. 8 :
Fig. 8: Left: empirical percentage of variance explained by the first 8 PCs.Right: cumulative percentage of explained variance captured by the first 8 PCs.

Fig. 9 :
Fig. 9: Top left: observed LWST in August 1996.Top right: representation on the estimated first 3 PCs of LWST in August 1996.Center: same as top panels, for March 2011.Bottom: fPCA representation on the estimated first 3 PCs of LWST at the spatial location 3 displayed in the top-left map of Figure 1.
contrasts the observed LWST data, and the representation of LWST data on the first 3 estimated PCs.The top and central panels contrast the observed data (left) and the reconstructions on the first three estimated PCs (right), respectively, in August 1996 and March 2011.The bottom panel contrasts instead the observed temporal profile (in gray) and its reconstruction based on the first 3 estimated PCs, at the spatial location 3 displayed in the top left map of Figure1.The accuracy in the reconstruction, using only 3 PCs, is remarkable.
Fig. C1: Independent censoring in space and time, first PC.Top-left: spatial profile of the true first PC, at a fixed time step.Center and right panels: spatial profile of the estimated first PC, at the same time step.Bottom-left: temporal profile of the true and estimated first PC, at a fixed spatial location.Plots are produced using the same spatial location and time instant considered for producing Figure 5.