Skip to main content

Advertisement

Log in

Robust estimation of functional factor models with functional pairwise spatial signs

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Factor model analysis has emerged as a powerful tool to capture the latent dynamic structure of functional data from a dimension-reduction viewpoint. Conventional methods for estimating the factor model are sensitive to heavy tails and outliers. To address this issue and achieve robustness, we provide an eigenvalue-ratio based method to estimate the number of factors by replacing the covariance operator with the functional pairwise spatial sign operator. Moreover, we propose a two-step robust approach to recover the factor space. The convergence rates of the robust estimators for factor loadings, factor scores, and common components are derived under some mild conditions. Numerical studies and a real data analysis confirm the proposed procedures remain reliable even when the factors and idiosyncratic errors have heavy-tailed distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Alonso AM, Galeano P, Peña D (2020) A robust procedure to build dynamic factor models with cluster structure. J Econom 216(1):35–52

    Article  MathSciNet  Google Scholar 

  • Aneiros G, Cao R, Vieu P (2019) Editorial on the special issue on functional data analysis and related topics

  • Aneiros G, Horová I, Hušková M, Vieu P (2022) On functional data analysis and related topics. J Multivar Anal 189:104861

    Article  Google Scholar 

  • Bali JL, Boente G (2017) Robust estimators under a functional common principal components model. Comput Stat Data Anal 113:424–440

    Article  MathSciNet  Google Scholar 

  • Bali JL, Boente G, Tyler DE, Wang JL (2011) Robust functional principal components: a projection-pursuit approach. Ann Stat 39(6):2852–2882

    Article  MathSciNet  Google Scholar 

  • Bardsley P, Horváth L, Kokoszka P, Young G (2017) Change point tests in functional factor models with application to yield curves. Econom J 20(1):86–117

    Article  MathSciNet  Google Scholar 

  • Boente G, Salibián-Barrera M (2021) Robust functional principal components for sparse longitudinal data. Metron 79(2):159–188

    Article  MathSciNet  Google Scholar 

  • Chen L, Wang W, Wu WB (2021) Dynamic semiparametric factor model with structural breaks. J Bus Econ Stat 39(3):757–771

    Article  MathSciNet  CAS  Google Scholar 

  • Dai X, Müller HG (2018) Principal component analysis for functional data on Riemannian manifolds and spheres. Ann Stat 46(6B):3334–3361

    Article  MathSciNet  Google Scholar 

  • Febrero-Bande M, Galeano P, González-Manteiga W (2017) Functional principal component regression and functional partial least-squares regression: an overview and a comparative study. Int Stat Rev 85(1):61–83

    Article  MathSciNet  Google Scholar 

  • Gao Y, Shang HL, Yang Y (2019) High-dimensional functional time series forecasting: an application to age-specific mortality rates. J Multivar Anal 170:232–243

    Article  MathSciNet  Google Scholar 

  • Gao Y, Shang HL, Yang Y (2021) Factor-augmented smoothing model for functional data. arXiv preprint arXiv:2102.02580

  • Gervini D (2009) Detecting and handling outlying trajectories in irregularly sampled functional datasets. Ann Appl Stat 3(4):1758–1775

    Article  MathSciNet  Google Scholar 

  • Guo S, Qiao X, Wang Q (2021) Factor modelling for high-dimensional functional time series. arXiv preprint arXiv:2112.13651

  • Hall P, Hosseini-Nasab M (2006) On properties of functional principal components analysis. J R Stat Soc Ser B Stat Methodol 68(1):109–126

    Article  MathSciNet  Google Scholar 

  • Hall P, Müller HG, Wang JL (2006) Properties of principal component methods for functional and longitudinal data analysis. Ann Appl Stat 34(3):1493–1517

    MathSciNet  Google Scholar 

  • Hallin M, Nisol G, Tavakoli S (2023) Factor models for high-dimensional functional time series I: representation results. J Time Ser Anal 44(5–6):578–600

    Article  MathSciNet  Google Scholar 

  • Han F, Liu H (2018) Eca: high-dimensional elliptical component analysis in non-gaussian distributions. J Am Stat Assoc 113(521):252–268

    Article  MathSciNet  CAS  Google Scholar 

  • Hays S, Shen H, Huang JZ (2012) Functional dynamic factor models with application to yield curve forecasting. Ann Appl Stat 6(3):870–894

    Article  MathSciNet  Google Scholar 

  • He Y, Kong X, Yu L, Zhang X (2022) Large-dimensional factor analysis without moment constraints. J Bus Econ Stat 40(1):302–312

    Article  MathSciNet  Google Scholar 

  • He Y, Li L, Liu D, Zhou WX (2023) Huber principal component analysis for large-dimensional factor models. arXiv preprint arXiv:2303.02817

  • Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, Berlin

    Book  Google Scholar 

  • Horváth L, Li B, Li H, Liu Z (2020) Time-varying beta in functional factor models: evidence from china. N Am J Econ Finance 54:101283

    Article  Google Scholar 

  • Kokoszka P, Miao H, Zhang X (2015) Functional dynamic factor model for intraday price curves. J Financ Econom 13(2):456–477

    Google Scholar 

  • Kowal DR, Canale A (2021) Semiparametric functional factor models with Bayesian rank selection. arXiv preprint arXiv:2108.02151

  • Li G, Huang JZ, Shen H (2018) Exponential family functional data analysis via a low-rank model. Biometrics 74(4):1301–1310

    Article  MathSciNet  PubMed  Google Scholar 

  • Li D, Qiao X, Wang Z (2023) Factor-guided estimation of large covariance matrix function with conditional functional sparsity. arXiv preprint arXiv:2311.02450

  • Ling N, Vieu P (2018) Nonparametric modelling for functional data: selected survey and tracks for future. Statistics 52(4):934–949

    Article  MathSciNet  Google Scholar 

  • Lu J, Han F, Liu H (2021) Robust scatter matrix estimation for high dimensional distributions with heavy tail. IEEE Trans Inf Theory 67(8):5283–5304

    Article  MathSciNet  Google Scholar 

  • Otto S, Salish N (2022) Approximate factor models for functional time series. arXiv preprint arXiv:2201.02532

  • Park Y, Oh HS, Lim Y (2024) A data-adaptive dimension reduction for functional data via penalized low-rank approximation. Stat Comput 34(36):66

    MathSciNet  Google Scholar 

  • Ran H, Bai Y (2021) On soft Bayesian additive regression trees and asynchronous longitudinal regression analysis. arXiv preprint arXiv:2108.11603

  • Sawant P, Billor N, Shin H (2012) Functional outlier detection with robust functional principal component analysis. Comput Stat 27:83–102

    Article  MathSciNet  Google Scholar 

  • Stock JH, Watson MW (2012) Dynamic factor models. Oxford University Press, Oxford

    Google Scholar 

  • Stock JH, Watson MW (2016) Dynamic factor models, factor-augmented vector autoregressions, and structural vector autoregressions in macroeconomics. In: Handbook of macroeconomics, vol 2. Elsevier, pp 415–525

  • Tang C, Shang HL, Yang Y (2021) Multi-population mortality forecasting using high-dimensional functional factor models. arXiv preprint arXiv:2109.04146

  • Tang C, Shang HL, Yang Y (2022) Clustering and forecasting multiple functional time series. Ann Appl Stat 16(4):2523–2553

    Article  MathSciNet  Google Scholar 

  • Tavakoli S, Nisol G, Hallin M (2019) High-dimensional functional factor models. arXiv preprint arXiv:1905.10325

  • Tavakoli S, Nisol G, Hallin M (2023) Factor models for high-dimensional functional time series II: estimation and forecasting. J Time Ser Anal 44(5–6):600–621

    MathSciNet  Google Scholar 

  • Wang D, Liu X, Chen R (2019) Factor models for matrix-valued high-dimensional time series. J Econom 208(1):231–248

    Article  MathSciNet  Google Scholar 

  • Wang G, Liu S, Han F, Di C (2021) Robust functional principal component analysis via functional pairwise spatial signs. arXiv preprint arXiv:2101.06415

  • Wen S, Lin H (2022) Factor-guided functional pca for high-dimensional functional data. arXiv preprint arXiv:2211.12012

  • Wohl DA, Zeng D, Stewart P, Glomb N, Alcorn T, Jones S, Handy J, Fiscus S, Weinberg A, Gowda D et al (2005) Cytomegalovirus viremia, mortality, and end-organ disease among patients with aids receiving potent antiretroviral therapies. J Acquir Immune Defic Syndr 38(5):538–544

    Article  PubMed  Google Scholar 

  • Yang X, Du L (2023) Robust multiple testing under high-dimensional dynamic factor model. arXiv preprint arXiv:2303.07631

  • Yao F, Müller HG, Wang JL (2005) Functional data analysis for sparse longitudinal data. J Am Stat Assoc 100(470):577–590

    Article  MathSciNet  CAS  Google Scholar 

  • Yu L, He Y, Zhang X (2019) Robust factor number specification for large-dimensional elliptical factor model. J Multivar Anal 174:104543

    Article  MathSciNet  Google Scholar 

  • Zhong R, Liu S, Li H, Zhang J (2022a) Functional principal component analysis estimator for non-Gaussian data. J Stat Comput Simul 92(13):2788–2801

    Article  MathSciNet  Google Scholar 

  • Zhong R, Liu S, Li H, Zhang J (2022b) Robust functional principal component analysis for non-Gaussian longitudinal data. J Multivar Anal 189:104864

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors thank the Editor-in-Chief, an Associate Editor, and two anonymous reviewers for many helpful and constructive comments. This research was sponsored by the National Natural Science Foundation of China (Grant No. 72071068) and China Scholarship Council (202206690042).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nengxiang Ling.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Lemma 3.1 Based on Assumptions 3.1\(-\)3.3, we can get \(N(C_2+o(1))\le \lambda _i(\Gamma )\le N(C_1+o(1))\) for \(i\le k\); \(C_2\le \lambda _i(\Gamma )\le C_1\) for \(k<i\le N\); \(\lambda _i(\Gamma )\le C_1\) for \(i>N\). Moreover, by Theorem 2.4 in Zhong et al. (2022b), \( \lambda _{i}(K)=\mathbb {E}\left[ \frac{\lambda _{i}(\Gamma ) U_{i}^{2}}{\sum _{i=1}^{\infty } \lambda _{i}(\Gamma ) U_{i}^{2}}\right] \) with \( U_{i}=\frac{\langle Y-{\tilde{Y}}, \phi _{i}\rangle }{\sqrt{2 \lambda _{i}(\Gamma )} }\) and \( \sum _{i=1}^{\infty } \lambda _{i}(K)=1 \). Thus, for \(i\le k\),

$$\begin{aligned}\lambda _i(K)\le \frac{\lambda _1(\Gamma )}{\lambda _k(\Gamma )} \mathbb {E}\left[ \frac{U_{i}^{2}}{\sum _{i=1}^{\infty } U_{i}^{2}}\right] \le \frac{C_1}{kC_2}+o(1),\end{aligned}$$
$$\begin{aligned} \begin{aligned} \lambda _i(K)\ge&\mathbb {E}\left[ \frac{\lambda _k(\Gamma )U_{i}^{2}}{\sum _{i=1}^{k} \lambda _i(\Gamma ) U_{i}^{2}+\sum _{i=k+1}^{\infty }C_1 U_{i}^{2}}\right] \\=&\mathbb {E}\left[ \frac{\lambda _k(\Gamma )U_{i}^{2}}{\sum _{i=1}^{k} (\lambda _1(\Gamma )-C_1 )U_{i}^{2}+\sum _{i=1}^{\infty }C_1 U_{i}^{2}}\right] \\=&\frac{\lambda _k(\Gamma )}{\lambda _1(\Gamma )-C_1}\mathbb {E}\left[ \frac{a_1U_{i}^{2}}{\sum _{i=1}^{k} a_1U_{i}^{2}+ 1}\right] , \end{aligned} \end{aligned}$$

where \(a_1=\frac{\lambda _1(\Gamma )}{C_1}-1\), and we have

$$\begin{aligned}{} & {} k\mathbb {E}\left[ \frac{a_1U_{i}^{2}}{\sum _{i=1}^{k} a_1U_{i}^{2}+ 1}\right] +\mathbb {E}\left[ \frac{1}{\sum _{i=1}^{k} a_1U_{i}^{2}+ 1}\right] =1.\\{} & {} \mathbb {E}\left[ \frac{1}{\sum _{i=1}^{k} a_1 U_{i}^{2}+ 1}\right] \le \mathbb {E}\left[ \frac{1}{1+a_1 U_{i}^{2}}\right] \le \frac{1}{2}\mathbb {P}(a_1 U_{i}^{2}\le 1)+\frac{1}{2}\mathbb {P}(a_1 U_{i}^{2}> 1)=\frac{1}{2}\mathbb {P}(a_1 U_{i}^{2}\le 1)+\frac{1}{2} \le \frac{1}{2} +o(1).\end{aligned}$$

Similarly,

$$\begin{aligned}\mathbb {E}\left[ \frac{a_1U_{i}^{2}}{\sum _{i=1}^{k} a_1U_{i}^{2}+ 1}\right] \ge \frac{1}{k}(\frac{1}{2}+o(1)),\end{aligned}$$

which implies that \(\lambda _i(K)\ge \frac{C_2}{2kC_1}+o(1)\).

For \(k<i\le N\), we have

$$\begin{aligned}\lambda _i(K)\le \mathbb {E}\left[ \frac{C_1U_{i}^{2}}{\sum _{i=1+k}^{N} C_2U_{i}^{2}}\right] =\frac{C_1}{(N-k)C_2}=\frac{C_1}{NC_2}(1+o(1))=O\left( \frac{1}{N}\right) .\end{aligned}$$

For \(i> N\), we have

$$\begin{aligned}\lambda _i(K)\le \lambda _k(K)=O\left( \frac{1}{N}\right) .\end{aligned}$$

Proof of Theorem 3.1

By Lemma 2.1, Lemma 3.1, and Weyl’s theorem, \( \lambda _{r}({\widehat{K}}) \asymp 1, r \le k \) and \(\lambda _{r}({\widehat{K}})=O_{p}(N^{-1 / 2}), r>k \). Let \(\alpha =N^{-1 / 2}\), then

$$\begin{aligned} \max _{i < k} \frac{\lambda _{i}({\widehat{K}})}{\lambda _{i+1}({\widehat{K}})+c \alpha }=O_{p}(1), \quad \max _{i >k} \frac{\lambda _{i}({\widehat{K}})}{\lambda _{i+1}({\widehat{K}})+c \alpha } \lesssim O_{p}(1), \end{aligned}$$

Therefore,

$$\begin{aligned} \frac{\lambda _{k}({\widehat{K}})}{\lambda _{k+1}({\widehat{K}})+c \alpha } \ge c \alpha ^{-1} \rightarrow \infty , \end{aligned}$$

which concludes the consistency.

Proof of Theorem 3.2 Similar to the proof of Lemma 2.3 in Horváth and Kokoszka (2012), we have

$$\begin{aligned} \max _{1 \le h \le k}\left\| {\widehat{f}}_{h}-sf_{h}\right\| \le \frac{2 \sqrt{2}}{\alpha }\left\| {\widehat{K}}-K\right\| _{\mathcal {S}} \end{aligned}$$

where \( \alpha =\min \left\{ \lambda _{1}-\lambda _{2}, \ldots , \lambda _{K-1}-\lambda _{K}, \lambda _{K}\right\} \), and the Hilbert-Schmidt norm of a Hilbert-Schmidt operator S is defined by

$$\begin{aligned} \Vert S\Vert _{\mathcal {S}}^{2}=\sum _{j=1}^{\infty }\left\| S\left( e_{j}\right) \right\| ^{2}, \end{aligned}$$

where \( \left\{ e_{1}, e_{2}, \ldots \right\} \) is any orthonormal basis. Then the asymptotic result of the factors follows from Lemma 2.1. In addition, by Assumption 3.3, \(\varvec{S}=\textrm{sgn}(\frac{1}{N} \sum _{i=1}^{N}(\widehat{\varvec{l}}_i \varvec{l}_i^{\top })) =\textrm{diag}\{s_1,\dots ,s_k\}\) with entries \(\pm 1\). Then,

$$\begin{aligned} \begin{aligned} \frac{1}{N}\Vert \widehat{\varvec{l}}_i - \varvec{Sl}_i \Vert ^2&=\frac{1}{N}\sum _{i=1}^{N}(\widehat{\varvec{l}}_i - \varvec{Sl}_i)^{\top }(\widehat{\varvec{l}}_i - \varvec{Sl}_i) \\&=\frac{1}{N}\sum _{i=1}^{N}\sum _{h=1}^{k}({\widehat{l}}_{ih}-s_h l_{ih})^2. \end{aligned} \end{aligned}$$

Recall that \({\widehat{l}}_{ih}=\langle Y_{i}, {\widehat{f}}_{h}\rangle \), \(\frac{1}{N}\sum _{i=1}^{N}\sum _{h=1}^{k}({\widehat{l}}_{ih}-s_h l_{ih})^2=\frac{1}{N}\sum _{i=1}^{N}\sum _{h=1}^{k} \int _\mathcal {I}(Y_i(t)({\widehat{f}}_h(t)-s_h f_h(t))+\epsilon _i(t)s_h f_h(t)+s_h l_{ih}(f_h(t)f_h(t)-1))\textrm{dt}=O_p(N^{-1})\), where the last equality follows by Assumption 3.1 and the orthonormality of the factor curves, which concludes the proof of the theorem.

Proof of Corollary 3.1 By Theorem 3.2 and triangular inequality, we have

$$\begin{aligned} \begin{aligned} \frac{1}{N}\sum _{i=1}^{N}\left\| \sum _{h=1}^{k} {\widehat{l}}_{ih}{\widehat{f}}_h-l_{ih} f_{h}\right\| ^2&=\frac{1}{N}\sum _{i=1}^{N}\left\| \sum _{h=1}^{k} {\widehat{l}}_{ih}{\widehat{f}}_h-s_h{\widehat{l}}_{ih}f_h+s_h{\widehat{l}}_{ih}f_h-l_{ih} f_{h}\right\| ^2 \\ {}&\le \frac{1}{N}\sum _{i=1}^{N}\sum _{h=1}^{k}l_{ih}^2\Vert {\widehat{f}}_{h}-sf_{h}\Vert ^2+\frac{1}{N}\sum _{i=1}^{N}\sum _{h=1}^{k}(s_h {\widehat{l}}_{ih}-l_{ih})^2 \Vert f_h\Vert ^2\\&=O_{P}\left( \frac{1}{N}\right) . \end{aligned} \end{aligned}$$

See Tables 7, 8, 9, 10, 11, 12 and 13.

Table 7 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario III over 500 repetitions, \(k=3\)
Table 8 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario I over 500 repetitions, \(k=2\)
Table 9 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario II over 500 repetitions, \(k=2\)
Table 10 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario III over 500 repetitions, \(k=2\)
Table 11 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario I over 500 repetitions, \(k=4\)
Table 12 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario II over 500 repetitions, \(k=4\)
Table 13 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario III over 500 repetitions, \(k=4\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, S., Ling, N. Robust estimation of functional factor models with functional pairwise spatial signs. Comput Stat (2024). https://doi.org/10.1007/s00180-024-01477-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00180-024-01477-2

Keywords

Navigation