Robust estimation of functional factor models with functional pairwise spatial signs

Yang, Shuquan; Ling, Nengxiang

doi:10.1007/s00180-024-01477-2

Robust estimation of functional factor models with functional pairwise spatial signs

Original Paper
Published: 13 March 2024

(2024)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Shuquan Yang¹ &
Nengxiang Ling¹

Abstract

Factor model analysis has emerged as a powerful tool to capture the latent dynamic structure of functional data from a dimension-reduction viewpoint. Conventional methods for estimating the factor model are sensitive to heavy tails and outliers. To address this issue and achieve robustness, we provide an eigenvalue-ratio based method to estimate the number of factors by replacing the covariance operator with the functional pairwise spatial sign operator. Moreover, we propose a two-step robust approach to recover the factor space. The convergence rates of the robust estimators for factor loadings, factor scores, and common components are derived under some mild conditions. Numerical studies and a real data analysis confirm the proposed procedures remain reliable even when the factors and idiosyncratic errors have heavy-tailed distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A robust factor analysis model based on the canonical fundamental skew-t distribution

Article 29 May 2022

Single- and Multiple-Group Penalized Factor Analysis: A Trust-Region Algorithm Approach with Integrated Automatic Multiple Tuning Parameter Selection

Article Open access 26 March 2021

Fully and partially exploratory factor analysis with bi-level Bayesian regularization

Article 12 July 2022

References

Alonso AM, Galeano P, Peña D (2020) A robust procedure to build dynamic factor models with cluster structure. J Econom 216(1):35–52
Article MathSciNet Google Scholar
Aneiros G, Cao R, Vieu P (2019) Editorial on the special issue on functional data analysis and related topics
Aneiros G, Horová I, Hušková M, Vieu P (2022) On functional data analysis and related topics. J Multivar Anal 189:104861
Article Google Scholar
Bali JL, Boente G (2017) Robust estimators under a functional common principal components model. Comput Stat Data Anal 113:424–440
Article MathSciNet Google Scholar
Bali JL, Boente G, Tyler DE, Wang JL (2011) Robust functional principal components: a projection-pursuit approach. Ann Stat 39(6):2852–2882
Article MathSciNet Google Scholar
Bardsley P, Horváth L, Kokoszka P, Young G (2017) Change point tests in functional factor models with application to yield curves. Econom J 20(1):86–117
Article MathSciNet Google Scholar
Boente G, Salibián-Barrera M (2021) Robust functional principal components for sparse longitudinal data. Metron 79(2):159–188
Article MathSciNet Google Scholar
Chen L, Wang W, Wu WB (2021) Dynamic semiparametric factor model with structural breaks. J Bus Econ Stat 39(3):757–771
Article MathSciNet CAS Google Scholar
Dai X, Müller HG (2018) Principal component analysis for functional data on Riemannian manifolds and spheres. Ann Stat 46(6B):3334–3361
Article MathSciNet Google Scholar
Febrero-Bande M, Galeano P, González-Manteiga W (2017) Functional principal component regression and functional partial least-squares regression: an overview and a comparative study. Int Stat Rev 85(1):61–83
Article MathSciNet Google Scholar
Gao Y, Shang HL, Yang Y (2019) High-dimensional functional time series forecasting: an application to age-specific mortality rates. J Multivar Anal 170:232–243
Article MathSciNet Google Scholar
Gao Y, Shang HL, Yang Y (2021) Factor-augmented smoothing model for functional data. arXiv preprint arXiv:2102.02580
Gervini D (2009) Detecting and handling outlying trajectories in irregularly sampled functional datasets. Ann Appl Stat 3(4):1758–1775
Article MathSciNet Google Scholar
Guo S, Qiao X, Wang Q (2021) Factor modelling for high-dimensional functional time series. arXiv preprint arXiv:2112.13651
Hall P, Hosseini-Nasab M (2006) On properties of functional principal components analysis. J R Stat Soc Ser B Stat Methodol 68(1):109–126
Article MathSciNet Google Scholar
Hall P, Müller HG, Wang JL (2006) Properties of principal component methods for functional and longitudinal data analysis. Ann Appl Stat 34(3):1493–1517
MathSciNet Google Scholar
Hallin M, Nisol G, Tavakoli S (2023) Factor models for high-dimensional functional time series I: representation results. J Time Ser Anal 44(5–6):578–600
Article MathSciNet Google Scholar
Han F, Liu H (2018) Eca: high-dimensional elliptical component analysis in non-gaussian distributions. J Am Stat Assoc 113(521):252–268
Article MathSciNet CAS Google Scholar
Hays S, Shen H, Huang JZ (2012) Functional dynamic factor models with application to yield curve forecasting. Ann Appl Stat 6(3):870–894
Article MathSciNet Google Scholar
He Y, Kong X, Yu L, Zhang X (2022) Large-dimensional factor analysis without moment constraints. J Bus Econ Stat 40(1):302–312
Article MathSciNet Google Scholar
He Y, Li L, Liu D, Zhou WX (2023) Huber principal component analysis for large-dimensional factor models. arXiv preprint arXiv:2303.02817
Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, Berlin
Book Google Scholar
Horváth L, Li B, Li H, Liu Z (2020) Time-varying beta in functional factor models: evidence from china. N Am J Econ Finance 54:101283
Article Google Scholar
Kokoszka P, Miao H, Zhang X (2015) Functional dynamic factor model for intraday price curves. J Financ Econom 13(2):456–477
Google Scholar
Kowal DR, Canale A (2021) Semiparametric functional factor models with Bayesian rank selection. arXiv preprint arXiv:2108.02151
Li G, Huang JZ, Shen H (2018) Exponential family functional data analysis via a low-rank model. Biometrics 74(4):1301–1310
Article MathSciNet PubMed Google Scholar
Li D, Qiao X, Wang Z (2023) Factor-guided estimation of large covariance matrix function with conditional functional sparsity. arXiv preprint arXiv:2311.02450
Ling N, Vieu P (2018) Nonparametric modelling for functional data: selected survey and tracks for future. Statistics 52(4):934–949
Article MathSciNet Google Scholar
Lu J, Han F, Liu H (2021) Robust scatter matrix estimation for high dimensional distributions with heavy tail. IEEE Trans Inf Theory 67(8):5283–5304
Article MathSciNet Google Scholar
Otto S, Salish N (2022) Approximate factor models for functional time series. arXiv preprint arXiv:2201.02532
Park Y, Oh HS, Lim Y (2024) A data-adaptive dimension reduction for functional data via penalized low-rank approximation. Stat Comput 34(36):66
MathSciNet Google Scholar
Ran H, Bai Y (2021) On soft Bayesian additive regression trees and asynchronous longitudinal regression analysis. arXiv preprint arXiv:2108.11603
Sawant P, Billor N, Shin H (2012) Functional outlier detection with robust functional principal component analysis. Comput Stat 27:83–102
Article MathSciNet Google Scholar
Stock JH, Watson MW (2012) Dynamic factor models. Oxford University Press, Oxford
Google Scholar
Stock JH, Watson MW (2016) Dynamic factor models, factor-augmented vector autoregressions, and structural vector autoregressions in macroeconomics. In: Handbook of macroeconomics, vol 2. Elsevier, pp 415–525
Tang C, Shang HL, Yang Y (2021) Multi-population mortality forecasting using high-dimensional functional factor models. arXiv preprint arXiv:2109.04146
Tang C, Shang HL, Yang Y (2022) Clustering and forecasting multiple functional time series. Ann Appl Stat 16(4):2523–2553
Article MathSciNet Google Scholar
Tavakoli S, Nisol G, Hallin M (2019) High-dimensional functional factor models. arXiv preprint arXiv:1905.10325
Tavakoli S, Nisol G, Hallin M (2023) Factor models for high-dimensional functional time series II: estimation and forecasting. J Time Ser Anal 44(5–6):600–621
MathSciNet Google Scholar
Wang D, Liu X, Chen R (2019) Factor models for matrix-valued high-dimensional time series. J Econom 208(1):231–248
Article MathSciNet Google Scholar
Wang G, Liu S, Han F, Di C (2021) Robust functional principal component analysis via functional pairwise spatial signs. arXiv preprint arXiv:2101.06415
Wen S, Lin H (2022) Factor-guided functional pca for high-dimensional functional data. arXiv preprint arXiv:2211.12012
Wohl DA, Zeng D, Stewart P, Glomb N, Alcorn T, Jones S, Handy J, Fiscus S, Weinberg A, Gowda D et al (2005) Cytomegalovirus viremia, mortality, and end-organ disease among patients with aids receiving potent antiretroviral therapies. J Acquir Immune Defic Syndr 38(5):538–544
Article PubMed Google Scholar
Yang X, Du L (2023) Robust multiple testing under high-dimensional dynamic factor model. arXiv preprint arXiv:2303.07631
Yao F, Müller HG, Wang JL (2005) Functional data analysis for sparse longitudinal data. J Am Stat Assoc 100(470):577–590
Article MathSciNet CAS Google Scholar
Yu L, He Y, Zhang X (2019) Robust factor number specification for large-dimensional elliptical factor model. J Multivar Anal 174:104543
Article MathSciNet Google Scholar
Zhong R, Liu S, Li H, Zhang J (2022a) Functional principal component analysis estimator for non-Gaussian data. J Stat Comput Simul 92(13):2788–2801
Article MathSciNet Google Scholar
Zhong R, Liu S, Li H, Zhang J (2022b) Robust functional principal component analysis for non-Gaussian longitudinal data. J Multivar Anal 189:104864
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors thank the Editor-in-Chief, an Associate Editor, and two anonymous reviewers for many helpful and constructive comments. This research was sponsored by the National Natural Science Foundation of China (Grant No. 72071068) and China Scholarship Council (202206690042).

Author information

Authors and Affiliations

School of Mathematics, Hefei University of Technology, Hefei, 230009, China
Shuquan Yang & Nengxiang Ling

Authors

Shuquan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Nengxiang Ling
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nengxiang Ling.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Lemma 3.1 Based on Assumptions 3.1$-$3.3, we can get $N(C_2+o(1))\le \lambda _i(\Gamma )\le N(C_1+o(1))$ for $i\le k$; $C_2\le \lambda _i(\Gamma )\le C_1$ for $k<i\le N$; $\lambda _i(\Gamma )\le C_1$ for $i>N$. Moreover, by Theorem 2.4 in Zhong et al. (2022b), $ \lambda _{i}(K)=\mathbb {E}\left[ \frac{\lambda _{i}(\Gamma ) U_{i}^{2}}{\sum _{i=1}^{\infty } \lambda _{i}(\Gamma ) U_{i}^{2}}\right] $ with $ U_{i}=\frac{\langle Y-{\tilde{Y}}, \phi _{i}\rangle }{\sqrt{2 \lambda _{i}(\Gamma )} }$ and $ \sum _{i=1}^{\infty } \lambda _{i}(K)=1 $. Thus, for $i\le k$,

$$\begin{aligned}\lambda _i(K)\le \frac{\lambda _1(\Gamma )}{\lambda _k(\Gamma )} \mathbb {E}\left[ \frac{U_{i}^{2}}{\sum _{i=1}^{\infty } U_{i}^{2}}\right] \le \frac{C_1}{kC_2}+o(1),\end{aligned}$$

$$\begin{aligned} \begin{aligned} \lambda _i(K)\ge&\mathbb {E}\left[ \frac{\lambda _k(\Gamma )U_{i}^{2}}{\sum _{i=1}^{k} \lambda _i(\Gamma ) U_{i}^{2}+\sum _{i=k+1}^{\infty }C_1 U_{i}^{2}}\right] \\=&\mathbb {E}\left[ \frac{\lambda _k(\Gamma )U_{i}^{2}}{\sum _{i=1}^{k} (\lambda _1(\Gamma )-C_1 )U_{i}^{2}+\sum _{i=1}^{\infty }C_1 U_{i}^{2}}\right] \\=&\frac{\lambda _k(\Gamma )}{\lambda _1(\Gamma )-C_1}\mathbb {E}\left[ \frac{a_1U_{i}^{2}}{\sum _{i=1}^{k} a_1U_{i}^{2}+ 1}\right] , \end{aligned} \end{aligned}$$

where $a_1=\frac{\lambda _1(\Gamma )}{C_1}-1$, and we have

$$\begin{aligned}{} & {} k\mathbb {E}\left[ \frac{a_1U_{i}^{2}}{\sum _{i=1}^{k} a_1U_{i}^{2}+ 1}\right] +\mathbb {E}\left[ \frac{1}{\sum _{i=1}^{k} a_1U_{i}^{2}+ 1}\right] =1.\\{} & {} \mathbb {E}\left[ \frac{1}{\sum _{i=1}^{k} a_1 U_{i}^{2}+ 1}\right] \le \mathbb {E}\left[ \frac{1}{1+a_1 U_{i}^{2}}\right] \le \frac{1}{2}\mathbb {P}(a_1 U_{i}^{2}\le 1)+\frac{1}{2}\mathbb {P}(a_1 U_{i}^{2}> 1)=\frac{1}{2}\mathbb {P}(a_1 U_{i}^{2}\le 1)+\frac{1}{2} \le \frac{1}{2} +o(1).\end{aligned}$$

Similarly,

$$\begin{aligned}\mathbb {E}\left[ \frac{a_1U_{i}^{2}}{\sum _{i=1}^{k} a_1U_{i}^{2}+ 1}\right] \ge \frac{1}{k}(\frac{1}{2}+o(1)),\end{aligned}$$

which implies that $\lambda _i(K)\ge \frac{C_2}{2kC_1}+o(1)$.

For $k<i\le N$, we have

$$\begin{aligned}\lambda _i(K)\le \mathbb {E}\left[ \frac{C_1U_{i}^{2}}{\sum _{i=1+k}^{N} C_2U_{i}^{2}}\right] =\frac{C_1}{(N-k)C_2}=\frac{C_1}{NC_2}(1+o(1))=O\left( \frac{1}{N}\right) .\end{aligned}$$

For $i> N$, we have

$$\begin{aligned}\lambda _i(K)\le \lambda _k(K)=O\left( \frac{1}{N}\right) .\end{aligned}$$

Proof of Theorem 3.1

By Lemma 2.1, Lemma 3.1, and Weyl’s theorem, $ \lambda _{r}({\widehat{K}}) \asymp 1, r \le k $ and $\lambda _{r}({\widehat{K}})=O_{p}(N^{-1 / 2}), r>k $. Let $\alpha =N^{-1 / 2}$, then

$$\begin{aligned} \max _{i < k} \frac{\lambda _{i}({\widehat{K}})}{\lambda _{i+1}({\widehat{K}})+c \alpha }=O_{p}(1), \quad \max _{i >k} \frac{\lambda _{i}({\widehat{K}})}{\lambda _{i+1}({\widehat{K}})+c \alpha } \lesssim O_{p}(1), \end{aligned}$$

Therefore,

$$\begin{aligned} \frac{\lambda _{k}({\widehat{K}})}{\lambda _{k+1}({\widehat{K}})+c \alpha } \ge c \alpha ^{-1} \rightarrow \infty , \end{aligned}$$

which concludes the consistency.

Proof of Theorem 3.2 Similar to the proof of Lemma 2.3 in Horváth and Kokoszka (2012), we have

$$\begin{aligned} \max _{1 \le h \le k}\left\| {\widehat{f}}_{h}-sf_{h}\right\| \le \frac{2 \sqrt{2}}{\alpha }\left\| {\widehat{K}}-K\right\| _{\mathcal {S}} \end{aligned}$$

where $ \alpha =\min \left\{ \lambda _{1}-\lambda _{2}, \ldots , \lambda _{K-1}-\lambda _{K}, \lambda _{K}\right\} $, and the Hilbert-Schmidt norm of a Hilbert-Schmidt operator S is defined by

$$\begin{aligned} \Vert S\Vert _{\mathcal {S}}^{2}=\sum _{j=1}^{\infty }\left\| S\left( e_{j}\right) \right\| ^{2}, \end{aligned}$$

where $ \left\{ e_{1}, e_{2}, \ldots \right\} $ is any orthonormal basis. Then the asymptotic result of the factors follows from Lemma 2.1. In addition, by Assumption 3.3, $\varvec{S}=\textrm{sgn}(\frac{1}{N} \sum _{i=1}^{N}(\widehat{\varvec{l}}_i \varvec{l}_i^{\top })) =\textrm{diag}\{s_1,\dots ,s_k\}$ with entries $\pm 1$. Then,

$$\begin{aligned} \begin{aligned} \frac{1}{N}\Vert \widehat{\varvec{l}}_i - \varvec{Sl}_i \Vert ^2&=\frac{1}{N}\sum _{i=1}^{N}(\widehat{\varvec{l}}_i - \varvec{Sl}_i)^{\top }(\widehat{\varvec{l}}_i - \varvec{Sl}_i) \\&=\frac{1}{N}\sum _{i=1}^{N}\sum _{h=1}^{k}({\widehat{l}}_{ih}-s_h l_{ih})^2. \end{aligned} \end{aligned}$$

Recall that ${\widehat{l}}_{ih}=\langle Y_{i}, {\widehat{f}}_{h}\rangle $, $\frac{1}{N}\sum _{i=1}^{N}\sum _{h=1}^{k}({\widehat{l}}_{ih}-s_h l_{ih})^2=\frac{1}{N}\sum _{i=1}^{N}\sum _{h=1}^{k} \int _\mathcal {I}(Y_i(t)({\widehat{f}}_h(t)-s_h f_h(t))+\epsilon _i(t)s_h f_h(t)+s_h l_{ih}(f_h(t)f_h(t)-1))\textrm{dt}=O_p(N^{-1})$, where the last equality follows by Assumption 3.1 and the orthonormality of the factor curves, which concludes the proof of the theorem.

Proof of Corollary 3.1 By Theorem 3.2 and triangular inequality, we have

$$\begin{aligned} \begin{aligned} \frac{1}{N}\sum _{i=1}^{N}\left\| \sum _{h=1}^{k} {\widehat{l}}_{ih}{\widehat{f}}_h-l_{ih} f_{h}\right\| ^2&=\frac{1}{N}\sum _{i=1}^{N}\left\| \sum _{h=1}^{k} {\widehat{l}}_{ih}{\widehat{f}}_h-s_h{\widehat{l}}_{ih}f_h+s_h{\widehat{l}}_{ih}f_h-l_{ih} f_{h}\right\| ^2 \\ {}&\le \frac{1}{N}\sum _{i=1}^{N}\sum _{h=1}^{k}l_{ih}^2\Vert {\widehat{f}}_{h}-sf_{h}\Vert ^2+\frac{1}{N}\sum _{i=1}^{N}\sum _{h=1}^{k}(s_h {\widehat{l}}_{ih}-l_{ih})^2 \Vert f_h\Vert ^2\\&=O_{P}\left( \frac{1}{N}\right) . \end{aligned} \end{aligned}$$

See Tables 7, 8, 9, 10, 11, 12 and 13.

Table 7 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario III over 500 repetitions, $k=3$

Full size table

Table 8 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario I over 500 repetitions, $k=2$

Full size table

Table 9 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario II over 500 repetitions, $k=2$

Full size table

Table 10 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario III over 500 repetitions, $k=2$

Full size table

Table 11 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario I over 500 repetitions, $k=4$

Full size table

Table 12 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario II over 500 repetitions, $k=4$

Full size table

Table 13 Simulation results for estimating the factor loadings, factor scores, and common components in Scenario III over 500 repetitions, $k=4$

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, S., Ling, N. Robust estimation of functional factor models with functional pairwise spatial signs. Comput Stat (2024). https://doi.org/10.1007/s00180-024-01477-2

Download citation

Received: 21 November 2023
Accepted: 19 February 2024
Published: 13 March 2024
DOI: https://doi.org/10.1007/s00180-024-01477-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust estimation of functional factor models with functional pairwise spatial signs

Abstract

Access this article

Similar content being viewed by others

A robust factor analysis model based on the canonical fundamental skew-t distribution

Single- and Multiple-Group Penalized Factor Analysis: A Trust-Region Algorithm Approach with Integrated Automatic Multiple Tuning Parameter Selection

Fully and partially exploratory factor analysis with bi-level Bayesian regularization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust estimation of functional factor models with functional pairwise spatial signs

Abstract

Access this article

Similar content being viewed by others

A robust factor analysis model based on the canonical fundamental skew-t distribution

Single- and Multiple-Group Penalized Factor Analysis: A Trust-Region Algorithm Approach with Integrated Automatic Multiple Tuning Parameter Selection

Fully and partially exploratory factor analysis with bi-level Bayesian regularization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation