Skip to main content
Log in

Robust functional principal components for irregularly spaced longitudinal data

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Consider longitudinal data \(x_{ij},\) with \(i=1,...,n\) and \(j=1,...,p,\) where \(x_{ij}\) is the observation of the smooth random function \(X_{i}\left( .\right) \) at time \(t_{j}.\) The goal of this paper is to develop a parsimonious representation of the data by a linear combination of a set of \(q<p\) smooth functions \(H_{k}\left( .\right) \) (\(k=1,..,q)\) in the sense that \(x_{ij}\approx \mu _{j}+\sum _{k=1}^{q}\beta _{ki}H_{k}\left( t_{j}\right) .\) This representation should be resistant to atypical \(X_{i}\)’s (“case contamination”), resistant to isolated gross errors at some cells (ij) (”cell contamination”), and applicable when some of the \(x_{ij}\) are missing (”irregularly spaced—or ’incomplete’—data”). Two approaches will be proposed for this problem. One deals with the three requirements stated above, and is based on ideas similar to MM-estimation (Yohai in Ann Stat 15:642–656, 1987). The other is a simple and fast estimator which can be applied to complete data with case- and cellwise contamination, and is based on applying a standard robust principal components estimate and smoothing the principal directions. Experiments with real and simulated data suggest that with complete data the simple estimator outperforms its competitors, while the MM estimator is competitive for incomplete data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Bali JL, Boente G, Tyler DE, Wang J-L (2011) Robust functional principal components: a projection-pursuit approach. Ann Stat 39:2852–2882

    Article  MathSciNet  Google Scholar 

  • Bay SD (1999) The UCI KDD Archive [http://kdd.ics.uci.edu], University of California, Irvine, Department of Information and Computer Science

  • Boente G, Salibian-Barrera M (2015) S-estimators for functional principal component analysis. JASA 110:1100–1111

    Article  MathSciNet  Google Scholar 

  • Cevallos Valdiviezo H (2016) On methods for prediction based on complex data with missing values and robust principal component analysis, PhD thesis, Ghent University (supervisors Van Aelst S. and Van den Poel, D.)

  • Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. JASA 74:829–836

    Article  MathSciNet  Google Scholar 

  • Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28:81–124

    Article  Google Scholar 

  • Grecki T, Krzyko M, Waszak L, Woyski W (2018) Selected statistical methods of data analysis for multivariate functional data. Stat Pap 59:153–182

    Article  MathSciNet  Google Scholar 

  • James G, Hastie TG, Sugar CA (2001) Principal component models for sparse functional data. Biometrika 87:587–602

    Article  MathSciNet  Google Scholar 

  • Lee S, Shin H, Billor N (2013) M-type smoothing spline estimators for principal functions. Comput Stat Data Anal 66:89–100

    Article  MathSciNet  Google Scholar 

  • Locantore N, Marron JS, Simpson DG, Tripoli N, Zhang JT, Cohen KL (1999) Robust principal components for functional data. Test 8:1–28

    Article  MathSciNet  Google Scholar 

  • Maronna R (2005) Principal components and orthogonal regression based on robust scales. Technometrics 47:264–273

    Article  MathSciNet  Google Scholar 

  • Maronna RA, Martin RD, Yohai VJ, Salibian-Barrera M (2019) Robust statistics: theory and methods (with R), 2nd edn. Wiley, Chichester

    MATH  Google Scholar 

  • Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. JASA 88:1273–1283

    Article  MathSciNet  Google Scholar 

  • Yao F, Müller H-G, Wang J-L (2005) Functional data analysis for sparse longitudinal data. JASA 100:577–590

    Article  MathSciNet  Google Scholar 

  • Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15:642–656

    Article  MathSciNet  Google Scholar 

  • Yohai VJ, Zamar RH (1988) High breakdown-point estimates of regression by means of the minimization of an efficient scale. J Am Stat Assoc 83:406–413

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research was partially supported by Grant 20020170100022BA from the University of Buenos Aires, Argentina. The author thanks the two anonymous reviewers for their insightful comments that much helped to improve the paper’s coherence.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo A. Maronna.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: computing algorithm of the MM-estimator

As it happens with most robust estimators, the nonlinearity and lack of convexity of the problem (3) preclude finding an “exact” solution, the main problem being the choice of the initial values for the iterative descent algorithm. The procedure described in this Section yields an approximate solution to (3). If one had to deal only with complete data and casewise contamination, it would be relatively straightforward to derive an approximate algorithm (details are omitted for brevity). However, sparsity and/or cellwise contamination make this “straightforward” aproach unfeasible. For this reason the components are computed sequentially rather than simultanously. It is known that in the case \(\rho (t)=t^2\) this sequential approach yields the same result as the direct solution of (3). This fact does not necessarily hold for general \(\rho \), and therefore the sequential algorithm only yields an approximation. However, experiments demonstrate that at least in the case of complete data with casewise contamination, the fits \(\widehat{x}_{ij}\) from the sequential and the “straightforward” algorithm referrred to above are practically identical (although not the \(\alpha \)s and \(\beta \)s).

1.1 The componentwise procedure

The main part of the computation proceeds one component at a time. Define for \(i=1,..,n\) and \(j=1,...,p\)

$$\begin{aligned} J_{i}=\{j:\ x_{ij} \text{ is } \text{ non-missing }\},\ I_{j}=\{i:\ x_{ij} \text{ is } \text{ non-missing }\}. \end{aligned}$$

At the beginning (“zero components”): apply robust nonparametric regression to obtain robust and smooth local location and scale values \({\widehat{\mu }}_{0j}\) and \({\widehat{\sigma }}_{0j},\) \(j=1,...,p\) as follows. Let \(S\left( .,.\right) \) be a robust smoother. Let \(m_{j}\) be a location M-estimator of \(\{x_{ij},~i\in I_{j}\};\) then the set \(\{{\widehat{\mu }}_{0j},~j=1,..,p\}\) is obtained by applying S to \(\left( t_{j} ,m_{j},j=1,...,p\right) .\) Let \(s_{j}\) be a \(\tau \)-scale (Yohai and Zamar 1988) of \(\{x_{ij} -{\widehat{\mu }}_{0j},i\in I_{j}\}\); then the set \(\{{\widehat{\sigma }} _{0j},j=1,,,,p\}\) is obtained by applying S to \(\left( t_{j} ,s_{j},j=1,..,p\right) .\) The chosen smoother was the robust version of Loess (Cleveland 1979) with a span of 0.3.

Let \(y_{ij}^{\left( 0\right) }=x_{ij}-{\widehat{\mu }}_{0j}.\) Compute the “unexplained variability”

$$\begin{aligned} V_{0}=\frac{1}{N}\sum _{i=1}^{n}\sum _{j\in I_{j}}^{{}}{\widehat{\sigma }}_{0} {}_{j}^{2}\rho \left( \frac{y_{ij}^{\left( 0\right) }}{{\widehat{\sigma }} _{0}{}_{j}}\right) ~~\mathrm {with~}\ N=\sum _{j=1}^{p}\mathrm {card}\left( I_{j}\right) . \end{aligned}$$

For component 1 use the \(y_{ij}^{\left( 0\right) }\) as input and compute

$$\begin{aligned} \left( \widehat{{\varvec{a}}}^{\left( 1\right) },\widehat{\varvec{\beta }}^{\left( 1\right) },\varvec{\mu }^{\left( 1\right) }\right) =\arg \min _{{\alpha ,\beta ,}\varvec{\mu }}\sum _{i=1}^{n} \sum _{j\in I_{j}}{\widehat{\sigma }}_{0j}^{2}\rho \left( \frac{y_{ij}^{\left( 0\right) }-\widehat{y}_{ij}^{\left( 0\right) }\left( \varvec{\alpha ,\beta ,\mu }\right) }{{\widehat{\sigma }}_{0j}}\right) . \end{aligned}$$
(11)

The minimum is computed iteratively, starting from a deterministic initial estimator to be described in Sect. 3.

Compute the residuals \(y_{ij}^{\left( 1\right) }=y_{ij}^{\left( 0\right) }-\widehat{y}_{ij}^{\left( 0\right) }\left( \widehat{\varvec{\alpha } }\varvec{,}\widehat{\varvec{\beta }}\varvec{,}\widehat{\varvec{\mu }}\right) .\) Apply a smoother to compute local residual scales \({\widehat{\sigma }}_{1}{}_{j}\) and the “unexplained variability” with one component:

$$\begin{aligned} V_{1}=\frac{1}{N}\sum _{i=1}^{n}\sum _{j\in I_{j}}{\widehat{\sigma }}_{1}{}_{j} ^{2}\rho \left( \frac{y_{ij}^{\left( 1\right) }}{{\widehat{\sigma }}_{1}{}_{j} }\right) . \end{aligned}$$

For component k we have

$$\begin{aligned} \left( \widehat{{\varvec{a}}}^{\left( k\right) },\widehat{\varvec{\beta }}^{\left( k\right) },\varvec{\mu }^{\left( k\right) }\right) =\arg \min _{{\alpha ,\beta ,}\varvec{\mu }}\sum _{i=1}^{n} \sum _{j\in I_{j}}{\widehat{\sigma }}_{k-1,j}^{2}\rho \left( \frac{y_{ij}^{\left( k-1\right) }-\widehat{y}_{ij}^{\left( k-1\right) }\left( \varvec{\alpha ,\beta ,\mu }\right) }{{\widehat{\sigma }}_{k-1,j}}\right) . \end{aligned}$$
(12)

Each component is orthogonalized with respect to the former ones. The procedure stops either at a fixed number of components or when the proportion of explained variability (5) is larger than a given value (e.g. 0.90).

1.2 The iterative descent algorithm

Computing each component requires an iterative algorithm and starting values. The algorithm is essentially one of “alternating regressions”.

Recall that at each step \(\varvec{\alpha \in }R^{m}\) and \(\varvec{\beta \in }R^{n}\) are one-dimensional. Put as usual \(\psi =\rho ^{\prime }\) and \(W(s)=\psi \left( s\right) /s.\) Put for brevity \(h\left( t\right) =\sum _{l=1}^{m}\alpha _{l}B_{l}\left( t\right) \) where \(B_{l}\) are the elements of the spline basis.

Differentiating the criterion in (12) yields a set of estimating equations that can be written in fixed-point form, yielding a “weighted alternating regressions” scheme. To simplify the notation the superscript \(\left( k-1\right) \) will be dropped from \(y_{ij}^{\left( k-1\right) }\) and \(\widehat{y}_{ij}^{\left( k-1\right) }.\) Put

$$\begin{aligned} w_{ij}=w_{ij}\left( \varvec{\alpha ,\beta ,\mu }\right) =W\left( \frac{y_{ij}-\widehat{y}_{ij}\left( \varvec{\alpha ,\beta ,\mu }\right) }{{\widehat{\sigma }}_{k-1,j}}\right) . \end{aligned}$$

Then \(\mu _{j}\) and \(\beta _{i}\) can be expressed as weighted residual means and weighted univariate least squares regressions, respectively:

$$\begin{aligned} \mu _{j}= & {} \frac{1}{\sum _{i\in I_{j}}w_{ij}}\sum _{i\in I_{j}}w_{ij}\left( y_{ij}-\beta _{i}h\left( t_{j}\right) \right) , \\ \beta _{i}= & {} \frac{\sum _{j\in J_{i}}w_{ij}h\left( t_{j}\right) ( y_{ij}-\mu _{j})}{\sum _{j\in J_{i}}w_{ij}h\left( t_{j}\right) ^{2}}, \end{aligned}$$

and \(\varvec{\alpha }\) is the solution of

$$\begin{aligned} \sum _{i=1}^{n}\sum _{j\in J_{i}}w_{ij}\left( y_{ij}-\mu _{j}\right) \beta _{i}{\mathbf {b}}\left( t_{j}\right) =\sum _{i=1}^{n}\sum _{j\in J_{i}} w_{ij}\beta _{i}^{2}{\mathbf {b}}\left( t_{j}\right) {\mathbf {b}}\left( t_{j}\right) ^{\prime }\varvec{\alpha } \end{aligned}$$

with \({\mathbf {b}}\left( t\right) =\left( B_{1}\left( t\right) ,...,B_{l}\left( t\right) \right) \)’.

At each iteration the \(w_{ij}\) are updated. It can be shown that the criterion descends at each iteration.

1.3 The initial values

For each component, initial values for \(\varvec{\alpha }\) and \(\varvec{\beta }\) are needed. They should be deterministic, since subsampling would make the procedure impractically slow.

1.3.1 The initial \(\varvec{\alpha }\)

For \(k,l\in \left\{ 1,...,p\right\} \) call \(N_{kl}\) the number of cases that have values in both \(t_{k}\) and \(t_{l}:\)

$$\begin{aligned} N_{kl}=\#\left( I_{k}\cap I_{l}\right) . \end{aligned}$$

In longitudinal studies, many \(N_{kl}\) may be null or very small.

Compute a (possibly incomplete) \(p\times p\) matrix \(\varvec{\varSigma =[}\sigma _{kl}\varvec{]}\) of pairwise robust covariances of \(\left( y_{ik},y_{il}:i\in I_{k}\cap I_{l}\right) \) with the Gnanadesikan and Kettenring (1972) procedure:

$$\begin{aligned} \mathrm {Cov}\left( X,Y\right) =\frac{1}{4}\left( S\left( X+Y\right) ^{2}-S\left( X-Y\right) ^{2}\right) , \end{aligned}$$

where S is a robust dispersion. Here S was chosen as the \(Q_{n}\) estimator of Rousseeuw and Croux (1993) on the basis of exploratory simulations.

Compute \(\sigma _{kl}\) as above for \(\left( k,l\right) \) such that \(N_{kl}\ge 3\) . If \(\min _{kl}N_{kl}\) is “large enough” (here: \(\ge 10)\) use the resulting \({\Sigma .}\)

Otherwise apply a two-dimensional smoother to improve \({\varvec{\Sigma }}\) and to fill in the missing values. The bivariate Loess was employed for this purpose.

Then compute the first eigenvector \({\mathbf {e}}\) of \(\varvec{\varSigma }\) (note that \(\varvec{\varSigma }\) is not guaranteed to be positive definite and hence further principal components may be unreliable).

Given \(\mathbf {e,}\) smooth it using the spline basis. Then \({\alpha }\) follows from (4).

1.3.2 The initial \(\varvec{\beta }\)

For \(i=1,...,n\) the initial \(\beta _{i}\) is a robust univariate regression of \(y_{ij}\) on \(h\left( t_{j}\right) \) \((j\in J_{i}),\) namely the \(L_{1}\) regression, which is fast and reliable.

Note that only cellwise outliers matter at this step.

1.4 The final adjustment

Note that the former steps yield only an approximate solution of (3), since the components are computed one at a time. In order to improve the approximation a natural procedure is as follows. After computing q components we have a \(p\times q\)-matrix \({\mathbf {U}}\) of principal directions with elements

$$\begin{aligned} u_{jk}=\sum _{l=1}^{m}B_{l}\left( t_{j}\right) \alpha _{kl}, \end{aligned}$$

an \(n\times p\)-matrix of weights \(\mathbf {W.}\) and a location vector \({\mu .}\) Then a natural improvement is—keeping \({\mathbf {U}}\) and \({\mu }\) fixed—to recompute the \(\beta \)s by means of univariate weighted regressions with weights \(w_{ij}.\) Let \({\beta }_{i.} =[\beta _{ik},k=1,..,p],\) and set

$$\begin{aligned} {\beta }_{i}=\arg \min _{{\beta \in }R^{q}}\sum _{j\in J_{i}} w_{ij}\left( x_{ij}-\mu _{j}-\mathbf {U}\beta \right) ^{2}. \end{aligned}$$

The effect of this step in the case of complete data is negligible, but it does improve the estimator’s behavior for incomplete data.

However, it was found out that the improvement is not good enough when the data are very sparse. For this reason a different approach was used, namely, to compute \({\beta }_{i}\) as a regression M-estimate. Let \({\mathbf {z}} _{i}=(x_{ij}-\mu _{j}:j\in J_{i})\) and \(\mathbf {V}=[v_{jk}]\) with \(v_{jk}=u_{jk}\) for \(j\in J_{i}.\) Then \({\beta }_{i}\) is a bisquare regression estimate of \({\mathbf {z}}_{i}\) on \(\mathbf {V,}\) with tuning constant equal to 4, using \(L_{1}\) as a starting estimate. Note that here only cell outliers matter, and therefore \(L_{1}\) yields reliable starting values. The estimator resulting from this step does not necessarily coincide with (3), but simulations show that it is much better than the “natural” adjustment described above when the data are very sparse.

Appendix B: the “naive” estimator: details

In step 1 of Sect. 3, compute for each \({\mathbf {x}}_{i}\) robust local location and scatter estimates \({\widetilde{\mu }}_{i} ,{\widetilde{\sigma }}_{i}\). The “cleaned” values are

$$\begin{aligned} \widetilde{x}_{ij}={\widetilde{\mu }}_{i}+{\widetilde{\sigma }}_{i}\psi \left( \frac{x_{ij}-{\widetilde{\mu }}_{i}}{{\widetilde{\sigma }}_{i}}\right) , \end{aligned}$$

where \(\psi \) is the bisquare \(\psi \)-function with tuning constant equal to 4.

The ordinary robust PCs of step 2 are computed using the cleaned data \(\widetilde{x}_{ij}\) with the S-M estimator of (Maronna 2005). Call \(\{\widehat{{\mathbf {x}}}_{i}^{\left( q\right) }\}\) the fit for q components and put \(r_{i}^{\left( q\right) }=\) \(\left\| {\mathbf {x}}_{i} -\widehat{{\mathbf {x}}}_{i}^{\left( q\right) }\right\| ,\) \(i=1,...,n.\) Then the S-M estimator minimizes \(S\left( r_{i}^{\left( q\right) } ,\ \ i=1,..,n\right) \) where S is the bisquare M-scale, using the “spherical principal components” of (Locantore et al. 1999) as starting point. The “proportion of unexplained variability” is

$$\begin{aligned}V_q= \frac{S\left( r_{i}^{\left( q\right) },\ \ i=1,..,n\right) }{S\left( r_{i}^{\left( 0\right) },\ \ i=1,..,n\right) }. \end{aligned}$$

and the “proportion of explained variability” is defined as \(1-V_q\).

The number of knots in step 3 is chosen through generalized cross-validation.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maronna, R.A. Robust functional principal components for irregularly spaced longitudinal data. Stat Papers 62, 1563–1582 (2021). https://doi.org/10.1007/s00362-019-01147-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-019-01147-2

Keywords

Navigation