Abstract
Consider longitudinal data \(x_{ij},\) with \(i=1,...,n\) and \(j=1,...,p,\) where \(x_{ij}\) is the observation of the smooth random function \(X_{i}\left( .\right) \) at time \(t_{j}.\) The goal of this paper is to develop a parsimonious representation of the data by a linear combination of a set of \(q<p\) smooth functions \(H_{k}\left( .\right) \) (\(k=1,..,q)\) in the sense that \(x_{ij}\approx \mu _{j}+\sum _{k=1}^{q}\beta _{ki}H_{k}\left( t_{j}\right) .\) This representation should be resistant to atypical \(X_{i}\)’s (“case contamination”), resistant to isolated gross errors at some cells (i, j) (”cell contamination”), and applicable when some of the \(x_{ij}\) are missing (”irregularly spaced—or ’incomplete’—data”). Two approaches will be proposed for this problem. One deals with the three requirements stated above, and is based on ideas similar to MM-estimation (Yohai in Ann Stat 15:642–656, 1987). The other is a simple and fast estimator which can be applied to complete data with case- and cellwise contamination, and is based on applying a standard robust principal components estimate and smoothing the principal directions. Experiments with real and simulated data suggest that with complete data the simple estimator outperforms its competitors, while the MM estimator is competitive for incomplete data.
Similar content being viewed by others
References
Bali JL, Boente G, Tyler DE, Wang J-L (2011) Robust functional principal components: a projection-pursuit approach. Ann Stat 39:2852–2882
Bay SD (1999) The UCI KDD Archive [http://kdd.ics.uci.edu], University of California, Irvine, Department of Information and Computer Science
Boente G, Salibian-Barrera M (2015) S-estimators for functional principal component analysis. JASA 110:1100–1111
Cevallos Valdiviezo H (2016) On methods for prediction based on complex data with missing values and robust principal component analysis, PhD thesis, Ghent University (supervisors Van Aelst S. and Van den Poel, D.)
Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. JASA 74:829–836
Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28:81–124
Grecki T, Krzyko M, Waszak L, Woyski W (2018) Selected statistical methods of data analysis for multivariate functional data. Stat Pap 59:153–182
James G, Hastie TG, Sugar CA (2001) Principal component models for sparse functional data. Biometrika 87:587–602
Lee S, Shin H, Billor N (2013) M-type smoothing spline estimators for principal functions. Comput Stat Data Anal 66:89–100
Locantore N, Marron JS, Simpson DG, Tripoli N, Zhang JT, Cohen KL (1999) Robust principal components for functional data. Test 8:1–28
Maronna R (2005) Principal components and orthogonal regression based on robust scales. Technometrics 47:264–273
Maronna RA, Martin RD, Yohai VJ, Salibian-Barrera M (2019) Robust statistics: theory and methods (with R), 2nd edn. Wiley, Chichester
Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. JASA 88:1273–1283
Yao F, Müller H-G, Wang J-L (2005) Functional data analysis for sparse longitudinal data. JASA 100:577–590
Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15:642–656
Yohai VJ, Zamar RH (1988) High breakdown-point estimates of regression by means of the minimization of an efficient scale. J Am Stat Assoc 83:406–413
Acknowledgements
This research was partially supported by Grant 20020170100022BA from the University of Buenos Aires, Argentina. The author thanks the two anonymous reviewers for their insightful comments that much helped to improve the paper’s coherence.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: computing algorithm of the MM-estimator
As it happens with most robust estimators, the nonlinearity and lack of convexity of the problem (3) preclude finding an “exact” solution, the main problem being the choice of the initial values for the iterative descent algorithm. The procedure described in this Section yields an approximate solution to (3). If one had to deal only with complete data and casewise contamination, it would be relatively straightforward to derive an approximate algorithm (details are omitted for brevity). However, sparsity and/or cellwise contamination make this “straightforward” aproach unfeasible. For this reason the components are computed sequentially rather than simultanously. It is known that in the case \(\rho (t)=t^2\) this sequential approach yields the same result as the direct solution of (3). This fact does not necessarily hold for general \(\rho \), and therefore the sequential algorithm only yields an approximation. However, experiments demonstrate that at least in the case of complete data with casewise contamination, the fits \(\widehat{x}_{ij}\) from the sequential and the “straightforward” algorithm referrred to above are practically identical (although not the \(\alpha \)s and \(\beta \)s).
1.1 The componentwise procedure
The main part of the computation proceeds one component at a time. Define for \(i=1,..,n\) and \(j=1,...,p\)
At the beginning (“zero components”): apply robust nonparametric regression to obtain robust and smooth local location and scale values \({\widehat{\mu }}_{0j}\) and \({\widehat{\sigma }}_{0j},\) \(j=1,...,p\) as follows. Let \(S\left( .,.\right) \) be a robust smoother. Let \(m_{j}\) be a location M-estimator of \(\{x_{ij},~i\in I_{j}\};\) then the set \(\{{\widehat{\mu }}_{0j},~j=1,..,p\}\) is obtained by applying S to \(\left( t_{j} ,m_{j},j=1,...,p\right) .\) Let \(s_{j}\) be a \(\tau \)-scale (Yohai and Zamar 1988) of \(\{x_{ij} -{\widehat{\mu }}_{0j},i\in I_{j}\}\); then the set \(\{{\widehat{\sigma }} _{0j},j=1,,,,p\}\) is obtained by applying S to \(\left( t_{j} ,s_{j},j=1,..,p\right) .\) The chosen smoother was the robust version of Loess (Cleveland 1979) with a span of 0.3.
Let \(y_{ij}^{\left( 0\right) }=x_{ij}-{\widehat{\mu }}_{0j}.\) Compute the “unexplained variability”
For component 1 use the \(y_{ij}^{\left( 0\right) }\) as input and compute
The minimum is computed iteratively, starting from a deterministic initial estimator to be described in Sect. 3.
Compute the residuals \(y_{ij}^{\left( 1\right) }=y_{ij}^{\left( 0\right) }-\widehat{y}_{ij}^{\left( 0\right) }\left( \widehat{\varvec{\alpha } }\varvec{,}\widehat{\varvec{\beta }}\varvec{,}\widehat{\varvec{\mu }}\right) .\) Apply a smoother to compute local residual scales \({\widehat{\sigma }}_{1}{}_{j}\) and the “unexplained variability” with one component:
For component k we have
Each component is orthogonalized with respect to the former ones. The procedure stops either at a fixed number of components or when the proportion of explained variability (5) is larger than a given value (e.g. 0.90).
1.2 The iterative descent algorithm
Computing each component requires an iterative algorithm and starting values. The algorithm is essentially one of “alternating regressions”.
Recall that at each step \(\varvec{\alpha \in }R^{m}\) and \(\varvec{\beta \in }R^{n}\) are one-dimensional. Put as usual \(\psi =\rho ^{\prime }\) and \(W(s)=\psi \left( s\right) /s.\) Put for brevity \(h\left( t\right) =\sum _{l=1}^{m}\alpha _{l}B_{l}\left( t\right) \) where \(B_{l}\) are the elements of the spline basis.
Differentiating the criterion in (12) yields a set of estimating equations that can be written in fixed-point form, yielding a “weighted alternating regressions” scheme. To simplify the notation the superscript \(\left( k-1\right) \) will be dropped from \(y_{ij}^{\left( k-1\right) }\) and \(\widehat{y}_{ij}^{\left( k-1\right) }.\) Put
Then \(\mu _{j}\) and \(\beta _{i}\) can be expressed as weighted residual means and weighted univariate least squares regressions, respectively:
and \(\varvec{\alpha }\) is the solution of
with \({\mathbf {b}}\left( t\right) =\left( B_{1}\left( t\right) ,...,B_{l}\left( t\right) \right) \)’.
At each iteration the \(w_{ij}\) are updated. It can be shown that the criterion descends at each iteration.
1.3 The initial values
For each component, initial values for \(\varvec{\alpha }\) and \(\varvec{\beta }\) are needed. They should be deterministic, since subsampling would make the procedure impractically slow.
1.3.1 The initial \(\varvec{\alpha }\)
For \(k,l\in \left\{ 1,...,p\right\} \) call \(N_{kl}\) the number of cases that have values in both \(t_{k}\) and \(t_{l}:\)
In longitudinal studies, many \(N_{kl}\) may be null or very small.
Compute a (possibly incomplete) \(p\times p\) matrix \(\varvec{\varSigma =[}\sigma _{kl}\varvec{]}\) of pairwise robust covariances of \(\left( y_{ik},y_{il}:i\in I_{k}\cap I_{l}\right) \) with the Gnanadesikan and Kettenring (1972) procedure:
where S is a robust dispersion. Here S was chosen as the \(Q_{n}\) estimator of Rousseeuw and Croux (1993) on the basis of exploratory simulations.
Compute \(\sigma _{kl}\) as above for \(\left( k,l\right) \) such that \(N_{kl}\ge 3\) . If \(\min _{kl}N_{kl}\) is “large enough” (here: \(\ge 10)\) use the resulting \({\Sigma .}\)
Otherwise apply a two-dimensional smoother to improve \({\varvec{\Sigma }}\) and to fill in the missing values. The bivariate Loess was employed for this purpose.
Then compute the first eigenvector \({\mathbf {e}}\) of \(\varvec{\varSigma }\) (note that \(\varvec{\varSigma }\) is not guaranteed to be positive definite and hence further principal components may be unreliable).
Given \(\mathbf {e,}\) smooth it using the spline basis. Then \({\alpha }\) follows from (4).
1.3.2 The initial \(\varvec{\beta }\)
For \(i=1,...,n\) the initial \(\beta _{i}\) is a robust univariate regression of \(y_{ij}\) on \(h\left( t_{j}\right) \) \((j\in J_{i}),\) namely the \(L_{1}\) regression, which is fast and reliable.
Note that only cellwise outliers matter at this step.
1.4 The final adjustment
Note that the former steps yield only an approximate solution of (3), since the components are computed one at a time. In order to improve the approximation a natural procedure is as follows. After computing q components we have a \(p\times q\)-matrix \({\mathbf {U}}\) of principal directions with elements
an \(n\times p\)-matrix of weights \(\mathbf {W.}\) and a location vector \({\mu .}\) Then a natural improvement is—keeping \({\mathbf {U}}\) and \({\mu }\) fixed—to recompute the \(\beta \)s by means of univariate weighted regressions with weights \(w_{ij}.\) Let \({\beta }_{i.} =[\beta _{ik},k=1,..,p],\) and set
The effect of this step in the case of complete data is negligible, but it does improve the estimator’s behavior for incomplete data.
However, it was found out that the improvement is not good enough when the data are very sparse. For this reason a different approach was used, namely, to compute \({\beta }_{i}\) as a regression M-estimate. Let \({\mathbf {z}} _{i}=(x_{ij}-\mu _{j}:j\in J_{i})\) and \(\mathbf {V}=[v_{jk}]\) with \(v_{jk}=u_{jk}\) for \(j\in J_{i}.\) Then \({\beta }_{i}\) is a bisquare regression estimate of \({\mathbf {z}}_{i}\) on \(\mathbf {V,}\) with tuning constant equal to 4, using \(L_{1}\) as a starting estimate. Note that here only cell outliers matter, and therefore \(L_{1}\) yields reliable starting values. The estimator resulting from this step does not necessarily coincide with (3), but simulations show that it is much better than the “natural” adjustment described above when the data are very sparse.
Appendix B: the “naive” estimator: details
In step 1 of Sect. 3, compute for each \({\mathbf {x}}_{i}\) robust local location and scatter estimates \({\widetilde{\mu }}_{i} ,{\widetilde{\sigma }}_{i}\). The “cleaned” values are
where \(\psi \) is the bisquare \(\psi \)-function with tuning constant equal to 4.
The ordinary robust PCs of step 2 are computed using the cleaned data \(\widetilde{x}_{ij}\) with the S-M estimator of (Maronna 2005). Call \(\{\widehat{{\mathbf {x}}}_{i}^{\left( q\right) }\}\) the fit for q components and put \(r_{i}^{\left( q\right) }=\) \(\left\| {\mathbf {x}}_{i} -\widehat{{\mathbf {x}}}_{i}^{\left( q\right) }\right\| ,\) \(i=1,...,n.\) Then the S-M estimator minimizes \(S\left( r_{i}^{\left( q\right) } ,\ \ i=1,..,n\right) \) where S is the bisquare M-scale, using the “spherical principal components” of (Locantore et al. 1999) as starting point. The “proportion of unexplained variability” is
and the “proportion of explained variability” is defined as \(1-V_q\).
The number of knots in step 3 is chosen through generalized cross-validation.
Rights and permissions
About this article
Cite this article
Maronna, R.A. Robust functional principal components for irregularly spaced longitudinal data. Stat Papers 62, 1563–1582 (2021). https://doi.org/10.1007/s00362-019-01147-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-019-01147-2