Abstract
This paper studies factor modeling for a vector of time series with long-memory properties to investigate how outliers affect the identification of the number of factors and also proposes a robust method to reduce their impact. The number of factors is estimated using an eigenvalue analysis for a non-negative definite matrix introduced by Lam et al. (2011). Two estimators are proposed; the first is based on the classical sample covariance function, and the second uses a robust covariance function estimate. In both cases, it is shown that the eigenvalues estimates have similar convergence rates. Empirical simulations support both estimators for multivariate stationary long-memory time series and show that the robust method is preferable when the data is contaminated with additive outliers. Time series of daily log returns are used as an example of application. In addition to abrupt observations, exchange rates exhibit non-stationarity properties with long memory parameters greater than one. Then we use semi-parametric long memory estimators to estimate the fractional parameters of the series. The number of factors was estimated using the classical and robust approaches. Due to the influence of the abrupt observations, these tools suggested a different number of factors to model the data. The robust method suggested two factors, while the classical approach indicated only one factor.
Similar content being viewed by others
References
Arcones MA (1994) Limit theorems for nonlinear functionals of a stationary gaussian sequence of vectors. Ann Probab 22(4):2242–2274. https://doi.org/10.1214/aop/1176988503
Bai J, Wang P (2016) Econometric analysis of large factor models. Annu Rev Econom 8(1):53–80
Bai X, Zheng L (2023) Robust factor models for high-dimensional time series and their forecasting. Commun Stat 52(19):1–14. https://doi.org/10.1080/03610926.2022.2033777
Brillinger DR (1981) Time series, data analysis and theory. Cambridge Series in Statistical and Probabilistic
Brockwell P, Davis R (2009) Time series: theory and methods. Springer Series in Statistics. Springer, New York
Chen EY, Tsay RS, Chen R (2020) Constrained factor models for high-dimensional matrix-variate time series. J Am Stat Assoc 115(530):775–793
Christou E (2020) Robust dimension reduction using sliced inverse median regression. Stat Pap 61:1799–1818. https://doi.org/10.1007/s00362-018-1007-z
Chung C (2002) Sample means, sample autocovariances, and linear regression of stationary multivariate long memory processes. Econ Theory 18:51–78
Cotta H, Reisen V, Bondon P, et al (2017) tsqn: applications of the qn estimator to time series (univariate and multivariate). https://CRAN.R-project.org/package=tsqn, r package version 1.0.0
Cotta H, Reisen VA, Bondon P (2023) A robust alternative method for the estimation of the covariance and the correlation matrices for multivariate time series. Unpublished. https://doi.org/10.13140/RG.2.2.24313.65129
Eichler M, Motta G, von Sachs R (2011) Fitting dynamic factor models to non-stationary time series. J Econometr 163(1):51–70
Fan J, Wang K, Zhong Y et al (2021) Robust high dimensional factor models with applications to statistical machine learning. Stat Sci 36(2):303–327
Fernández-Macho FJ (1997) A dynamic factor model for economic time series. Kybernetika 33(6):583–606
Geweke J (1977) The dynamic factor analysis of economic time series. In: Aigner DJ, Goldberger AS (eds) Latent variables in socio-economic models. North-Holland, Amsterdam, pp 365–383
Geweke JF, Singleton KJ (1981) Latent variable models for time series: A frequency domain approach with an application to the permanent income hypothesis. J Econometr 17(3):287–304
Hallin M, Liška R (2007) Determining the number of factors in the general dynamic factor model. J Am Stat Assoc 102(478):603–617
He Y, Wang Y, Yu L et al (2022) Matrix kendall’s tau in high-dimensions: a robust statistic for matrix factor model
Lam C, Yao Q (2012) Factor modeling for high-dimensional time series: inference for the number of factors. Ann Stat 40(2):694–726. https://doi.org/10.1214/12-AOS970
Lam C, Yao Q, Bathia N (2011) Estimation of latent factors for high-dimensional time series. Biometrika 98(4):901–918
Lévy-Leduc C, Boistard H, Moulines E et al (2011a) Large sample behavior of some well-known robust estimators under long-range dependence. Statistics 45(1):59–71
Lévy-Leduc C, Boistard H, Moulines E et al (2011b) Robust estimation of the scale and of the autocovariance function of gaussian short-and long-range dependent processes. J Time Ser Anal 32(2):135–156
Lévy-Leduc C, Bondon P, Reisen VA (2022) Robust autocovariance estimation from the frequency domain for univariate stationary time series. J Stat Plan Inference 221:281–298
Lin TI, Chen IA, Wang WL (2022) A robust factor analysis model based on the canonical fundamental skew-t distribution. Stat Pap 1–27
Ma Y, Genton MG (2000) Highly robust estimation of the autocovariance function. J Time Ser Anal 21:663–684
Molinares FF, Reisen VA, Cribari-Neto F (2009) Robust estimation in long-memory processes under additive outliers. J Stat Plan Inference 139(8):2511–2525
Peña D, Box GEP (1987) Identifying a simplifying structure in time series. J Am Stat Assoc 82(399):836–843
Priestley M, Rao T, Tong H (1974) Applications of principal component analysis and factor analysis in the identification of multivariable systems. IEEE Trans Autom Control 19(6):730–734
Reinsel G (2003) Elements of multivariate time series analysis. Springer Series in Statistics. Springer, New York
Reisen VA, Lévy-Leduc C, Taqqu MS (2017) An M-estimator for the long-memory parameter. J Stat Plan Inference 187:44–55
Reisen VA, Sgrâncio AM, Lévy-Leduc C et al (2019) Robust factor modeling for high-dimensional time series: an application to air pollution data. Appl Math Comput 346:842–852
Reisen VA, Lévy-Leduc C, Bondon P, et al (2020) An overview of robust spectral estimators. In: Chaari F, Leskow J, Zimroz R, Wyłomańska A, Dudek A (eds) Cyclostationarity: theory and methods—IV. CSTA 2017. Applied condition MonitoringTime series and cyclostationary process, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-030-22529-2_12
Rooch A, Zelo I, Fried R (2019) Estimation methods for the LRD parameter under a change in the mean. Stat Pap 60:313–347. https://doi.org/10.1007/s00362-016-0839-7
Ross SA (1976) The arbitrage theory of capital asset pricing. J Econ Theory 13(3):341–360
Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88(424):1273–1283
Taqqu MS (1975) Weak convergence to fractional Brownian motion and to the Rosenblatt process. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 31(4):287–302. https://doi.org/10.1007/BF00532868
Toman A (2014) Robust confirmatory factor analysis based on the forward search algorithm. Stat Pap 55:233–252
Velu RP, Reinsel GC, Wichern DW (1986) Reduced rank models for multiple time series. Biometrika 73(1):105–118
Acknowledgements
The authors gratefully acknowledge CNPq, CAPES and FAPES for their financial support. This research was also partially supported by DATAIA Convergence Institute as part of the “Programme d’Investissement d’Avenir, (ANR17-CONV-0003) operated by CentraleSupélec”. The authors thank the referees for their valuable suggestions that have led to clarifying and substantially improving the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
6 Appendix: Technical lemmas
6 Appendix: Technical lemmas
Lemmas 1 and 2 were stated and proved in Reisen et al. (2019) but are recalled here for the reader convenience.
Lemma 1
Let \(\widehat{\varvec{A}}_n\) be a sequence of \(p\times p\) symmetric matrices and \(\varvec{A}\) a \(p\times p\) symmetric matrix such that \(u_n(\widehat{\varvec{A}}_n-\varvec{A})=O_p(1)\), where \(u_n\) is a sequence of positive numbers tending to infinity as n tends to infinity, then
where \((\lambda _j(\widehat{\varvec{A}}))_{1\le j\le p}\) and \((\lambda _j(\varvec{A}))_{1\le j\le p}\) are the eigenvalues of \(\widehat{\varvec{A}}_n\) and A, respectively.
Lemma 2
Let \(\widehat{\varvec{A}}_n(h)\) be a sequence of \(p\times p\) symmetric matrices and A(h) a \(p\times p\) symmetric matrix such that \(u_n(\widehat{\varvec{A}}_n(h)-\varvec{A}(h))=O_p(1)\), for each fixed \(h\in \{1,\ldots ,h_{max}\}\), where \(u_n\) is a sequence of positive numbers tending to infinity as n tends to infinity, then
as n tends to infinity.
Lemma 3
Let h be a non negative integer and i and j two integers in \(\{1,\ldots ,k\}\). Assume that (2) holds, then the autocovariance estimator \({\widehat{\gamma }}^{Q}_{i,j}(h)\) defined in (10) satisfies the following limit theorems as n tends to infinity.
-
(i)
If, for all i in \(\{1,\ldots ,k\}\), \(D_i>1/2\),
$$\begin{aligned} \sqrt{n}({\widehat{\gamma }}^{Q}_{i,j}(h)-\gamma _{ij}(h)){\mathop {\longrightarrow }\limits ^{d}}{\mathcal {N}}(0,{\widetilde{\sigma }}_{i,j}^2(h)),\text { as } n\rightarrow \infty , \end{aligned}$$where
$$\begin{aligned} {\widetilde{\sigma }}_{i,j}^2(h)=[\psi (Y_{i,1},Y_{j,1+h})^2]+2\sum _{k\ge 1}{\mathbb {E}}[\psi (Y_{i,1},Y_{j,1+h})\psi (Y_{i,k+1},Y_{j,k+1+h})], \end{aligned}$$where \(\psi \) is
$$\begin{aligned} \psi (x,y)=\frac{1}{2}\left( \gamma _{i,i}(0)+\gamma _{j,j}(0)+2\gamma _{i,j}(h)\right) \nonumber \\ \text {IF}\left( \frac{x+y}{\sqrt{\gamma _{i,i}(0)+\gamma _{j,j}(0)+2\gamma _{i,j}(h)}},Q,\Phi \right) \nonumber \\ -\frac{1}{2}\left( \gamma _{i,i}(0)+\gamma _{j,j}(0)-2\gamma _{i,j}(h)\right) \nonumber \\ \text {IF}\left( \frac{x-y}{\sqrt{\gamma _{i,i}(0)+\gamma _{j,j}(0)-2\gamma _{i,j}(h)}},Q,\Phi \right) , \end{aligned}$$(12)and \(\text {IF}\) is defined in Lévy-Leduc et al. (2011b, Equation (20)).
-
(ii)
If, there exists \(i_0\) in \(\{1,\ldots ,k\}\) such that \(D_{i_0}<1/2\),
$$\begin{aligned} n^{D_{i_0}\wedge D_j}({\widehat{\gamma }}^{Q}_{i_0,j}(h)-\gamma _{i_0,j}(h))=O_P(1),\text { as } n\rightarrow \infty . \end{aligned}$$
Proof of Lemma 3
Observe that the autocovariance \(\gamma _{i,j}^{(+)}(\ell )\) of the process \((Y_{i,t}+Y_{j,t+h})_{t\ge 1}\) is equal to
By (2) and by using a Taylor expansion, \(\gamma _{i,j}^{(+)}(\ell )\) is proportional to \(\ell ^{D_i\wedge D_j}\). Hence, the process \((Y_{i,t}+Y_{j,t+h})_{t\ge 1}\) satisfies (Lévy-Leduc et al. 2011b, Assumption A2) with \(D=D_i\wedge D_j\). Since the autocovariance \(\gamma _{i,j}^{(-)}(\ell )\) of the process \((Y_{i,t}-Y_{j,t+h})_{t\ge 1}\) is equal to
by following the same lines, the process \((Y_{i,t}-Y_{j,t+h})_{t\ge 1}\) also satisfies (Lévy-Leduc et al. 2011b, Assumption A2) with \(D=D_i\wedge D_j\). In the case (i), the proof follows the same lines as the ones of the proof of (i) in Lévy-Leduc et al. (2011b, Theorem 4). In the case (ii), by applying the Delta method to Lévy-Leduc et al. (2011b, Equation (74)), we get
Similarly, we have that
which gives the result. \(\square \)
Lemma 4
Let h be a non negative integer and i and j two integers in \(\{1,\ldots ,k\}\). Assume that (2) holds, then the autocovariance estimator \({\widehat{\gamma }}_{i,j}(h)\) defined in (8) satisfies the following limit theorems as n tends to infinity.
-
(i)
If, for all i in \(\{1,\ldots ,k\}\), \(D_i>1/2\),
$$\begin{aligned} \sqrt{n}({\widehat{\gamma }}_{i,j}(h)-\gamma _{ij}(h)){\mathop {\longrightarrow }\limits ^{d}}{\mathcal {N}}(0,{\check{\sigma }}_{i,j}^2(h)),\text { as } n\rightarrow \infty , \end{aligned}$$where
$$\begin{aligned} {\check{\sigma }}_{i,j}^2(h)&={\mathbb {E}}\left[ \left( Y_{i,1}Y_{j,h+1}-\gamma _{i,j}(h)\right) ^2\right] \\&\quad +2\sum _{k\ge 1}{\mathbb {E}}\left[ \left( Y_{i,1}Y_{j,h+1}-\gamma _{i,j}(h)\right) \left( Y_{i,1+k}Y_{j,1+h+k}-\gamma _{i,j}(h)\right) \right] . \end{aligned}$$ -
(ii)
If, there exists \(i_0\) in \(\{1,\ldots ,k\}\) such that \(D_{i_0}<1/2\),
$$\begin{aligned} n^{D_{i_0}\wedge D_j}({\widehat{\gamma }}_{i_0,j}(h)-\gamma _{i_0,j}(h))=O_P(1),\text { as } n\rightarrow \infty . \end{aligned}$$
Proof of Lemma 4
-
(i)
Note that
$$\begin{aligned} {\widehat{\gamma }}_{i,j}(h)=\frac{1}{n}\sum _{t=1}^{n-h} Y_{i,t}Y_{j,t+h} -{\bar{Y}}_i{\bar{Y}}_j. \end{aligned}$$By Theorem 5.1 of Taqqu (1975),
$$\begin{aligned} {\bar{Y}}_i=O_P(n^{-D_i/2}) \text { and } {\bar{Y}}_j=O_P(n^{-D_j/2}). \end{aligned}$$Let \({\mathcal {Y}}_t=(Y_{i,t},Y_{j,t+h})\) and \(f:(x,y)\mapsto xy\) then, by Theorem 4 of Arcones (1994), we get that
$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{t=1}^{n-h} \left( f({\mathcal {Y}}_t)-{\mathbb {E}}(f({\mathcal {Y}}_t))\right) =\frac{1}{\sqrt{n}}\sum _{t=1}^{n-h} \left( Y_{i,t}Y_{j,t+h}-\gamma _{ij}(h)\right) {\mathop {\longrightarrow }\limits ^{d}}{\mathcal {N}}(0,{\check{\sigma }}_{i,j}^2(h)) \end{aligned}$$(13)since f is of Hermite rank 2, \(r^{(1,2)}(h)={\mathbb {E}}[Y_{i,t} Y_{j,t+h}]=\gamma _{i,j}(h)\), \(r^{(1,1)}(h)={\mathbb {E}}[Y_{i,t} Y_{i,t+h}]=\gamma _{i,i}(h)\), \(r^{(2,2)}(h)={\mathbb {E}}[Y_{j,t} Y_{j,t+h}]=\gamma _{j,j}(h)\) and \(D_i>1/2\), for all i. In (13),
$$\begin{aligned} {\check{\sigma }}_{i,j}^2(h)={\mathbb {E}}\left[ \left( Y_{i,1}Y_{j,h+1}-\gamma _{i,j}(h)\right) ^2\right] \\+2\sum _{k\ge 1}{\mathbb {E}}\left[ \left( Y_{i,1}Y_{j,h+1}-\gamma _{i,j}(h)\right) \left( Y_{i,1+k}Y_{j,1+h+k}-\gamma _{i,j}(h)\right) \right] . \end{aligned}$$ -
(ii)
Observe that
$$\begin{aligned} {\widehat{\gamma }}_{i,j}(h)=\frac{1}{4}\left( {{\widehat{\sigma }}_{n-h,Y_{i,1:n-h} +Y_{j,h+1:n}}}^2-{{\widehat{\sigma }}_{n-h,Y_{i,1:n-h}-Y_{j,h+1:n}}}^2\right) \end{aligned}$$(14)where
$$\begin{aligned} {{\widehat{\sigma }}_{n-h,Y_{i,1:n-h}+Y_{j,h+1:n}}}^2=:{\widehat{\sigma }}_{+,i,j}^2 =\frac{1}{n}\sum _{t=1}^{n-h}(Y_{i,t}+Y_{j,t+h})^2-({\bar{Y}}_i+{\bar{Y}}_j)^2 \\ \text { with } {\bar{Y}}_i=\frac{1}{n}\sum _{t=1}^nY_{i,t} \text { and } {\bar{Y}}_j=\frac{1}{n}\sum _{t=1}^nY_{j,t} \end{aligned}$$and
$$\begin{aligned} {{\widehat{\sigma }}_{n-h,Y_{i,1:n-h}-Y_{j,h+1:n}}}^2=:{\widehat{\sigma }}_{-,i,j}^2 =\frac{1}{n}\sum _{t=1}^{n-h}(Y_{i,t}-Y_{j,t+h})^2-({\bar{Y}}_i-{\bar{Y}}_j)^2. \end{aligned}$$By using the same arguments as those used in the proof of Lemma 3, we get that \((Y_{i,t}+Y_{j,t+h})_{t\ge 1}\) and \((Y_{i,t}-Y_{j,t+h})_{t\ge 1}\) satisfy (Lévy-Leduc et al. 2011b, Assumption A2) with \(D=D_i\wedge D_j\). Since \(D_{i_0}<1/2\), \(D_{i_0}\wedge D_j<1/2\) for all j in \(\{1,\ldots ,k\}\). Thus, \((Y_{i_0,t}+Y_{j,t+h})_{t\ge 1}\) and \((Y_{i_0,t}-Y_{j,t+h})_{t\ge 1}\) satisfy Assumption (A2) with \(D<1/2\). Hence, by Lévy-Leduc et al. (2011b, Proposition 3(b)) and the Delta method,
$$\begin{aligned} n^{D_{i_0}\wedge D_j}\left( {\widehat{\sigma }}_{+,i_0,j}^2-{{\,\textrm{Var}\,}}(Y_{i_0,t}+Y_{j,t+h})\right) =O_p(1),\text { as } n\rightarrow \infty \end{aligned}$$and
$$\begin{aligned} n^{D_{i_0}\wedge D_j}\left( {\widehat{\sigma }}_{-,i_0,j}^2-{{\,\textrm{Var}\,}}(Y_{i_0,t}-Y_{j,t+h})\right) =O_p(1),\text { as } n\rightarrow \infty , \end{aligned}$$which concludes the proof by (14).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Reisen, V.A., Lévy-Leduc, C., Monte, E.Z. et al. A dimension reduction factor approach for multivariate time series with long-memory: a robust alternative method. Stat Papers (2023). https://doi.org/10.1007/s00362-023-01504-2
Received:
Revised:
Published:
DOI: https://doi.org/10.1007/s00362-023-01504-2