Robust functional principal components for irregularly spaced longitudinal data

Maronna, Ricardo A.

doi:10.1007/s00362-019-01147-2

Robust functional principal components for irregularly spaced longitudinal data

Regular Article
Published: 18 November 2019

Volume 62, pages 1563–1582, (2021)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Ricardo A. Maronna¹

284 Accesses
3 Citations
Explore all metrics

Abstract

Consider longitudinal data $x_{ij},$ with $i=1,...,n$ and $j=1,...,p,$ where $x_{ij}$ is the observation of the smooth random function $X_{i}\left( .\right) $ at time $t_{j}.$ The goal of this paper is to develop a parsimonious representation of the data by a linear combination of a set of $q<p$ smooth functions $H_{k}\left( .\right) $ ($k=1,..,q)$ in the sense that $x_{ij}\approx \mu _{j}+\sum _{k=1}^{q}\beta _{ki}H_{k}\left( t_{j}\right) .$ This representation should be resistant to atypical $X_{i}$’s (“case contamination”), resistant to isolated gross errors at some cells (i, j) (”cell contamination”), and applicable when some of the $x_{ij}$ are missing (”irregularly spaced—or ’incomplete’—data”). Two approaches will be proposed for this problem. One deals with the three requirements stated above, and is based on ideas similar to MM-estimation (Yohai in Ann Stat 15:642–656, 1987). The other is a simple and fast estimator which can be applied to complete data with case- and cellwise contamination, and is based on applying a standard robust principal components estimate and smoothing the principal directions. Experiments with real and simulated data suggest that with complete data the simple estimator outperforms its competitors, while the MM estimator is competitive for incomplete data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust functional principal components for sparse longitudinal data

Article 20 February 2021

Simultaneous Variable Selection and Estimation in Generalized Semiparametric Mixed Effects Modeling of Longitudinal Data

Profile Estimation of Generalized Semiparametric Varying-Coefficient Additive Models for Longitudinal Data with Within-Subject Correlations

References

Bali JL, Boente G, Tyler DE, Wang J-L (2011) Robust functional principal components: a projection-pursuit approach. Ann Stat 39:2852–2882
Article MathSciNet Google Scholar
Bay SD (1999) The UCI KDD Archive [http://kdd.ics.uci.edu], University of California, Irvine, Department of Information and Computer Science
Boente G, Salibian-Barrera M (2015) S-estimators for functional principal component analysis. JASA 110:1100–1111
Article MathSciNet Google Scholar
Cevallos Valdiviezo H (2016) On methods for prediction based on complex data with missing values and robust principal component analysis, PhD thesis, Ghent University (supervisors Van Aelst S. and Van den Poel, D.)
Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. JASA 74:829–836
Article MathSciNet Google Scholar
Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28:81–124
Article Google Scholar
Grecki T, Krzyko M, Waszak L, Woyski W (2018) Selected statistical methods of data analysis for multivariate functional data. Stat Pap 59:153–182
Article MathSciNet Google Scholar
James G, Hastie TG, Sugar CA (2001) Principal component models for sparse functional data. Biometrika 87:587–602
Article MathSciNet Google Scholar
Lee S, Shin H, Billor N (2013) M-type smoothing spline estimators for principal functions. Comput Stat Data Anal 66:89–100
Article MathSciNet Google Scholar
Locantore N, Marron JS, Simpson DG, Tripoli N, Zhang JT, Cohen KL (1999) Robust principal components for functional data. Test 8:1–28
Article MathSciNet Google Scholar
Maronna R (2005) Principal components and orthogonal regression based on robust scales. Technometrics 47:264–273
Article MathSciNet Google Scholar
Maronna RA, Martin RD, Yohai VJ, Salibian-Barrera M (2019) Robust statistics: theory and methods (with R), 2nd edn. Wiley, Chichester
MATH Google Scholar
Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. JASA 88:1273–1283
Article MathSciNet Google Scholar
Yao F, Müller H-G, Wang J-L (2005) Functional data analysis for sparse longitudinal data. JASA 100:577–590
Article MathSciNet Google Scholar
Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15:642–656
Article MathSciNet Google Scholar
Yohai VJ, Zamar RH (1988) High breakdown-point estimates of regression by means of the minimization of an efficient scale. J Am Stat Assoc 83:406–413
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was partially supported by Grant 20020170100022BA from the University of Buenos Aires, Argentina. The author thanks the two anonymous reviewers for their insightful comments that much helped to improve the paper’s coherence.

Author information

Authors and Affiliations

Mathematics Department, Faculty of Exact Sciences, University of La Plata, Calle 50 y 115, 1900, La Plata, Argentina
Ricardo A. Maronna

Authors

Ricardo A. Maronna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricardo A. Maronna.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: computing algorithm of the MM-estimator

As it happens with most robust estimators, the nonlinearity and lack of convexity of the problem (3) preclude finding an “exact” solution, the main problem being the choice of the initial values for the iterative descent algorithm. The procedure described in this Section yields an approximate solution to (3). If one had to deal only with complete data and casewise contamination, it would be relatively straightforward to derive an approximate algorithm (details are omitted for brevity). However, sparsity and/or cellwise contamination make this “straightforward” aproach unfeasible. For this reason the components are computed sequentially rather than simultanously. It is known that in the case $\rho (t)=t^2$ this sequential approach yields the same result as the direct solution of (3). This fact does not necessarily hold for general $\rho $, and therefore the sequential algorithm only yields an approximation. However, experiments demonstrate that at least in the case of complete data with casewise contamination, the fits $\widehat{x}_{ij}$ from the sequential and the “straightforward” algorithm referrred to above are practically identical (although not the $\alpha $s and $\beta $s).

1.1 The componentwise procedure

The main part of the computation proceeds one component at a time. Define for $i=1,..,n$ and $j=1,...,p$

$$\begin{aligned} J_{i}=\{j:\ x_{ij} \text{ is } \text{ non-missing }\},\ I_{j}=\{i:\ x_{ij} \text{ is } \text{ non-missing }\}. \end{aligned}$$

At the beginning (“zero components”): apply robust nonparametric regression to obtain robust and smooth local location and scale values ${\widehat{\mu }}_{0j}$ and ${\widehat{\sigma }}_{0j},$ $j=1,...,p$ as follows. Let $S\left( .,.\right) $ be a robust smoother. Let $m_{j}$ be a location M-estimator of $\{x_{ij},~i\in I_{j}\};$ then the set $\{{\widehat{\mu }}_{0j},~j=1,..,p\}$ is obtained by applying S to $\left( t_{j} ,m_{j},j=1,...,p\right) .$ Let $s_{j}$ be a $\tau $-scale (Yohai and Zamar 1988) of $\{x_{ij} -{\widehat{\mu }}_{0j},i\in I_{j}\}$; then the set $\{{\widehat{\sigma }} _{0j},j=1,,,,p\}$ is obtained by applying S to $\left( t_{j} ,s_{j},j=1,..,p\right) .$ The chosen smoother was the robust version of Loess (Cleveland 1979) with a span of 0.3.

Let $y_{ij}^{\left( 0\right) }=x_{ij}-{\widehat{\mu }}_{0j}.$ Compute the “unexplained variability”

$$\begin{aligned} V_{0}=\frac{1}{N}\sum _{i=1}^{n}\sum _{j\in I_{j}}^{{}}{\widehat{\sigma }}_{0} {}_{j}^{2}\rho \left( \frac{y_{ij}^{\left( 0\right) }}{{\widehat{\sigma }} _{0}{}_{j}}\right) ~~\mathrm {with~}\ N=\sum _{j=1}^{p}\mathrm {card}\left( I_{j}\right) . \end{aligned}$$

For component 1 use the $y_{ij}^{\left( 0\right) }$ as input and compute

$$\begin{aligned} \left( \widehat{{\varvec{a}}}^{\left( 1\right) },\widehat{\varvec{\beta }}^{\left( 1\right) },\varvec{\mu }^{\left( 1\right) }\right) =\arg \min _{{\alpha ,\beta ,}\varvec{\mu }}\sum _{i=1}^{n} \sum _{j\in I_{j}}{\widehat{\sigma }}_{0j}^{2}\rho \left( \frac{y_{ij}^{\left( 0\right) }-\widehat{y}_{ij}^{\left( 0\right) }\left( \varvec{\alpha ,\beta ,\mu }\right) }{{\widehat{\sigma }}_{0j}}\right) . \end{aligned}$$

(11)

The minimum is computed iteratively, starting from a deterministic initial estimator to be described in Sect. 3.

Compute the residuals $y_{ij}^{\left( 1\right) }=y_{ij}^{\left( 0\right) }-\widehat{y}_{ij}^{\left( 0\right) }\left( \widehat{\varvec{\alpha } }\varvec{,}\widehat{\varvec{\beta }}\varvec{,}\widehat{\varvec{\mu }}\right) .$ Apply a smoother to compute local residual scales ${\widehat{\sigma }}_{1}{}_{j}$ and the “unexplained variability” with one component:

$$\begin{aligned} V_{1}=\frac{1}{N}\sum _{i=1}^{n}\sum _{j\in I_{j}}{\widehat{\sigma }}_{1}{}_{j} ^{2}\rho \left( \frac{y_{ij}^{\left( 1\right) }}{{\widehat{\sigma }}_{1}{}_{j} }\right) . \end{aligned}$$

For component k we have

$$\begin{aligned} \left( \widehat{{\varvec{a}}}^{\left( k\right) },\widehat{\varvec{\beta }}^{\left( k\right) },\varvec{\mu }^{\left( k\right) }\right) =\arg \min _{{\alpha ,\beta ,}\varvec{\mu }}\sum _{i=1}^{n} \sum _{j\in I_{j}}{\widehat{\sigma }}_{k-1,j}^{2}\rho \left( \frac{y_{ij}^{\left( k-1\right) }-\widehat{y}_{ij}^{\left( k-1\right) }\left( \varvec{\alpha ,\beta ,\mu }\right) }{{\widehat{\sigma }}_{k-1,j}}\right) . \end{aligned}$$

(12)

Each component is orthogonalized with respect to the former ones. The procedure stops either at a fixed number of components or when the proportion of explained variability (5) is larger than a given value (e.g. 0.90).

1.2 The iterative descent algorithm

Computing each component requires an iterative algorithm and starting values. The algorithm is essentially one of “alternating regressions”.

Recall that at each step $\varvec{\alpha \in }R^{m}$ and $\varvec{\beta \in }R^{n}$ are one-dimensional. Put as usual $\psi =\rho ^{\prime }$ and $W(s)=\psi \left( s\right) /s.$ Put for brevity $h\left( t\right) =\sum _{l=1}^{m}\alpha _{l}B_{l}\left( t\right) $ where $B_{l}$ are the elements of the spline basis.

Differentiating the criterion in (12) yields a set of estimating equations that can be written in fixed-point form, yielding a “weighted alternating regressions” scheme. To simplify the notation the superscript $\left( k-1\right) $ will be dropped from $y_{ij}^{\left( k-1\right) }$ and $\widehat{y}_{ij}^{\left( k-1\right) }.$ Put

$$\begin{aligned} w_{ij}=w_{ij}\left( \varvec{\alpha ,\beta ,\mu }\right) =W\left( \frac{y_{ij}-\widehat{y}_{ij}\left( \varvec{\alpha ,\beta ,\mu }\right) }{{\widehat{\sigma }}_{k-1,j}}\right) . \end{aligned}$$

Then $\mu _{j}$ and $\beta _{i}$ can be expressed as weighted residual means and weighted univariate least squares regressions, respectively:

$$\begin{aligned} \mu _{j}= & {} \frac{1}{\sum _{i\in I_{j}}w_{ij}}\sum _{i\in I_{j}}w_{ij}\left( y_{ij}-\beta _{i}h\left( t_{j}\right) \right) , \\ \beta _{i}= & {} \frac{\sum _{j\in J_{i}}w_{ij}h\left( t_{j}\right) ( y_{ij}-\mu _{j})}{\sum _{j\in J_{i}}w_{ij}h\left( t_{j}\right) ^{2}}, \end{aligned}$$

and $\varvec{\alpha }$ is the solution of

$$\begin{aligned} \sum _{i=1}^{n}\sum _{j\in J_{i}}w_{ij}\left( y_{ij}-\mu _{j}\right) \beta _{i}{\mathbf {b}}\left( t_{j}\right) =\sum _{i=1}^{n}\sum _{j\in J_{i}} w_{ij}\beta _{i}^{2}{\mathbf {b}}\left( t_{j}\right) {\mathbf {b}}\left( t_{j}\right) ^{\prime }\varvec{\alpha } \end{aligned}$$

with ${\mathbf {b}}\left( t\right) =\left( B_{1}\left( t\right) ,...,B_{l}\left( t\right) \right) $’.

At each iteration the $w_{ij}$ are updated. It can be shown that the criterion descends at each iteration.

1.3 The initial values

For each component, initial values for $\varvec{\alpha }$ and $\varvec{\beta }$ are needed. They should be deterministic, since subsampling would make the procedure impractically slow.

1.3.1 The initial $\varvec{\alpha }$

For $k,l\in \left\{ 1,...,p\right\} $ call $N_{kl}$ the number of cases that have values in both $t_{k}$ and $t_{l}:$

$$\begin{aligned} N_{kl}=\#\left( I_{k}\cap I_{l}\right) . \end{aligned}$$

In longitudinal studies, many $N_{kl}$ may be null or very small.

Compute a (possibly incomplete) $p\times p$ matrix $\varvec{\varSigma =[}\sigma _{kl}\varvec{]}$ of pairwise robust covariances of $\left( y_{ik},y_{il}:i\in I_{k}\cap I_{l}\right) $ with the Gnanadesikan and Kettenring (1972) procedure:

$$\begin{aligned} \mathrm {Cov}\left( X,Y\right) =\frac{1}{4}\left( S\left( X+Y\right) ^{2}-S\left( X-Y\right) ^{2}\right) , \end{aligned}$$

where S is a robust dispersion. Here S was chosen as the $Q_{n}$ estimator of Rousseeuw and Croux (1993) on the basis of exploratory simulations.

Compute $\sigma _{kl}$ as above for $\left( k,l\right) $ such that $N_{kl}\ge 3$ . If $\min _{kl}N_{kl}$ is “large enough” (here: $\ge 10)$ use the resulting ${\Sigma .}$

Otherwise apply a two-dimensional smoother to improve ${\varvec{\Sigma }}$ and to fill in the missing values. The bivariate Loess was employed for this purpose.

Then compute the first eigenvector ${\mathbf {e}}$ of $\varvec{\varSigma }$ (note that $\varvec{\varSigma }$ is not guaranteed to be positive definite and hence further principal components may be unreliable).

Given $\mathbf {e,}$ smooth it using the spline basis. Then ${\alpha }$ follows from (4).

1.3.2 The initial $\varvec{\beta }$

For $i=1,...,n$ the initial $\beta _{i}$ is a robust univariate regression of $y_{ij}$ on $h\left( t_{j}\right) $ $(j\in J_{i}),$ namely the $L_{1}$ regression, which is fast and reliable.

Note that only cellwise outliers matter at this step.

1.4 The final adjustment

Note that the former steps yield only an approximate solution of (3), since the components are computed one at a time. In order to improve the approximation a natural procedure is as follows. After computing q components we have a $p\times q$-matrix ${\mathbf {U}}$ of principal directions with elements

$$\begin{aligned} u_{jk}=\sum _{l=1}^{m}B_{l}\left( t_{j}\right) \alpha _{kl}, \end{aligned}$$

an $n\times p$-matrix of weights $\mathbf {W.}$ and a location vector ${\mu .}$ Then a natural improvement is—keeping ${\mathbf {U}}$ and ${\mu }$ fixed—to recompute the $\beta $s by means of univariate weighted regressions with weights $w_{ij}.$ Let ${\beta }_{i.} =[\beta _{ik},k=1,..,p],$ and set

$$\begin{aligned} {\beta }_{i}=\arg \min _{{\beta \in }R^{q}}\sum _{j\in J_{i}} w_{ij}\left( x_{ij}-\mu _{j}-\mathbf {U}\beta \right) ^{2}. \end{aligned}$$

The effect of this step in the case of complete data is negligible, but it does improve the estimator’s behavior for incomplete data.

However, it was found out that the improvement is not good enough when the data are very sparse. For this reason a different approach was used, namely, to compute ${\beta }_{i}$ as a regression M-estimate. Let ${\mathbf {z}} _{i}=(x_{ij}-\mu _{j}:j\in J_{i})$ and $\mathbf {V}=[v_{jk}]$ with $v_{jk}=u_{jk}$ for $j\in J_{i}.$ Then ${\beta }_{i}$ is a bisquare regression estimate of ${\mathbf {z}}_{i}$ on $\mathbf {V,}$ with tuning constant equal to 4, using $L_{1}$ as a starting estimate. Note that here only cell outliers matter, and therefore $L_{1}$ yields reliable starting values. The estimator resulting from this step does not necessarily coincide with (3), but simulations show that it is much better than the “natural” adjustment described above when the data are very sparse.

Appendix B: the “naive” estimator: details

In step 1 of Sect. 3, compute for each ${\mathbf {x}}_{i}$ robust local location and scatter estimates ${\widetilde{\mu }}_{i} ,{\widetilde{\sigma }}_{i}$. The “cleaned” values are

$$\begin{aligned} \widetilde{x}_{ij}={\widetilde{\mu }}_{i}+{\widetilde{\sigma }}_{i}\psi \left( \frac{x_{ij}-{\widetilde{\mu }}_{i}}{{\widetilde{\sigma }}_{i}}\right) , \end{aligned}$$

where $\psi $ is the bisquare $\psi $-function with tuning constant equal to 4.

The ordinary robust PCs of step 2 are computed using the cleaned data $\widetilde{x}_{ij}$ with the S-M estimator of (Maronna 2005). Call $\{\widehat{{\mathbf {x}}}_{i}^{\left( q\right) }\}$ the fit for q components and put $r_{i}^{\left( q\right) }=$ $\left\| {\mathbf {x}}_{i} -\widehat{{\mathbf {x}}}_{i}^{\left( q\right) }\right\| ,$ $i=1,...,n.$ Then the S-M estimator minimizes $S\left( r_{i}^{\left( q\right) } ,\ \ i=1,..,n\right) $ where S is the bisquare M-scale, using the “spherical principal components” of (Locantore et al. 1999) as starting point. The “proportion of unexplained variability” is

$$\begin{aligned}V_q= \frac{S\left( r_{i}^{\left( q\right) },\ \ i=1,..,n\right) }{S\left( r_{i}^{\left( 0\right) },\ \ i=1,..,n\right) }. \end{aligned}$$

and the “proportion of explained variability” is defined as $1-V_q$.

The number of knots in step 3 is chosen through generalized cross-validation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maronna, R.A. Robust functional principal components for irregularly spaced longitudinal data. Stat Papers 62, 1563–1582 (2021). https://doi.org/10.1007/s00362-019-01147-2

Download citation

Received: 23 November 2018
Revised: 29 October 2019
Published: 18 November 2019
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00362-019-01147-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust functional principal components for irregularly spaced longitudinal data

Abstract

Access this article

Similar content being viewed by others

Robust functional principal components for sparse longitudinal data

Simultaneous Variable Selection and Estimation in Generalized Semiparametric Mixed Effects Modeling of Longitudinal Data

Profile Estimation of Generalized Semiparametric Varying-Coefficient Additive Models for Longitudinal Data with Within-Subject Correlations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: computing algorithm of the MM-estimator

1.1 The componentwise procedure

1.2 The iterative descent algorithm

1.3 The initial values

1.3.1 The initial \(\varvec{\alpha }\)

1.3.2 The initial \(\varvec{\beta }\)

1.4 The final adjustment

Appendix B: the “naive” estimator: details

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust functional principal components for irregularly spaced longitudinal data

Abstract

Access this article

Similar content being viewed by others

Robust functional principal components for sparse longitudinal data

Simultaneous Variable Selection and Estimation in Generalized Semiparametric Mixed Effects Modeling of Longitudinal Data

Profile Estimation of Generalized Semiparametric Varying-Coefficient Additive Models for Longitudinal Data with Within-Subject Correlations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: computing algorithm of the MM-estimator

1.1 The componentwise procedure

1.2 The iterative descent algorithm

1.3 The initial values

1.3.1 The initial \(\varvec{\alpha }\)

1.3.2 The initial \(\varvec{\beta }\)

1.4 The final adjustment

Appendix B: the “naive” estimator: details

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation