Abstract
Principal component analysis (PCA) and canonical correlation analysis (CCA) are dimension-reduction techniques in which either a random vector is well approximated in a lower dimensional subspace or two random vectors from high dimensional spaces are reduced to a new pair of low dimensional vectors after applying linear transformations to each of them. In both techniques, the closeness between the higher dimensional vector and the lower representations is under concern, measuring the closeness through a robust function. Robust SM-estimation has been treated in the context of PCA and CCA showing an outstanding performance under casewise contamination, which encourages the study of asymptotic properties. We analyze consistency and asymptotic normality for the SM-canonical vectors. As a by-product of the CCA derivations, the asymptotics for PCA can also be obtained. A classical measure of robustness as the influence function is analyzed, showing the usual performance of S-estimation in different statistical models. The general ideas behind SM-estimation in either PCA or CCA are specially tailored to the context of association, rendering robust measures of association between random variables. By these means, a robust correlation measure is derived and the connection with the association measure provided by S-estimation for bivariate scatter is analyzed. On the other hand, we also propose a second robust correlation measure which is reminiscent of depth-based procedures.
Similar content being viewed by others
References
Adrover J, Donato SM (2015) A robust predictive approach for canonical correlation analysis. J Multivar Anal 133:356–376
Adrover J, Maronna R, Yohai V (2002) Relationships between maximum depth and projection estimates. J Stat Plan Inference 105:363–375
Alfons A, Croux C, Filzmoser P (2017) Robust maximum association estimators. J Am Stat Assoc 112(517):436–445
Anderson TW (1999) Asymptotic theory for canonical correlation analysis. J Multivar Anal 70(1):1–29
Boente G (1987) Asymptotic theory for robust principal components. J Multivar Anal 21(1):67–78
Branco JA, Croux C, Filzmoser P, Oliveira MR (2005) Robust canonical correlations: a comparative study. Comput Stat 20(2):203–229
Croux C, Haesbroeck G (2000) Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87:603–618
Croux C, Ruiz-Gazen A (2005) High breakdown estimators for principal components: the projection-pursuit approach revisited. J Multivar Anal 95:206–226
Croux C, García-Escudero LA, Gordaliza A, Ruwet C, San Martín R (2017) Robust PCA based on trimming. Stat Sin 27:1437–1459
Cui H, He X, Ng KW (2003) Asymptotic distributions of principal components based on robust dispersions. Biometrika 90:953–966
Das S, Sen PK (1998) Canonical correlations. In: Armitage P, Colton T (eds) Encyclopedia of biostatistics. Wiley, New York, pp 468–482
Davies PL (1987) Asymptotic behavior of S-estimators of multivariate location estimators and dispersion matrices. Ann Stat 15:1269–1292
Draǎković G, Breloy A, Pascal F (2019) On the asymptotics of Maronna’s robust PCA. IEEE Trans Signal Process 67(19):4964–4975
Furrer R, Genton MG (2011) Aggregation-cokriging for highly multivariate spatial data. Biometrika 98:615–631
Hampel FR (1971) A general qualitative definition of robustness. Ann Math Stat 42(6):1887–1896
Hotelling H (1936) Relations between two sets of variables. Biometrika 28:321–377
Kudraszow N, Maronna RA (2010) Estimates of MM type for the multivariate linear model. Technical report. arXiv:1004.4883
Kudraszow N, Maronna RA (2011) Estimates of MM type for the multivariate linear model. J Multivar Anal 102:1280–1292
Li G, Chen Z (1985) Projection-pursuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo. J Am Stat Assoc 80:759–766
Maronna R (1976) Robust M-estimators of multivariate location and scatter. Ann Stat 4(1):51–67
Maronna R (2005) Principal components and orthogonal regression based on robust scales. Technometrics 47(3):264–273
Maronna R, Martin D, Yohai, V, Salibián-Barrera (2019) Robust statistics, theory and methods, 2nd edn. Wiley Series in Probability and Statistics. Wiley, New York
Pakes A, Pollard D (1989) Simulation and the asymptotics of optimization estimators. Econometrica 57(5):1027–1057
Rao CR (1962) Relations between weak and uniform convergence of measures with applications. Ann Math Stat 33:659–680
Seber GAF (2004) Multivariate observations, 2nd edn. Wiley, New York
Taskinen S, Croux C, Kankainen A, Ollila E, Oja H (2006) Influence functions and efficiencies of the canonical correlation and vector estimates based on scatter and shape matrices. J Multivar Anal 97:359–384
ten Berge JMF (1979) On the equivalence of two oblique congruence rotation methods and orthogonal approximations. Psychometrika 44:359–364
Tyler D (1981) Asymptotic inference for eigenvectors. Ann Stat 9(4):725–736
Wilms I, Croux C (2015) Sparse canonical correlation analysis from a predictive point of view. Biom J 57(5):834–851
Wilms I, Croux C (2016) Robust sparse canonical correlation analysis. BMC Syst Biol 10:72. https://doi.org/10.1186/s12918-016-0317-9
Yohai V (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(2):642–658
Yohai V, García Ben M (1981) Canonical variables as optimal predictors. Ann Stat 8(4):865–869
Yohai V, Zamar R (1988) High breakdown-point estimates of regression by means of the minimization of an efficient scale. J Am Stat Assoc 83(402):406–413
Acknowledgements
We would like to thank two anonymous referees and the Associate Editor for their comments and suggestions that have resulted in a much improved paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Jorge G. Adrover: Research partially supported by Grants PICT 0821 and 0397 from ANPCYT, Grant 05/B424 Secyt, UNC, and Grant 20020100100276 from Secyt, UBA, Argentina. Stella M. Donato: Research partially supported by Grants PICT 0397 from ANPCYT and Secyt, UNC, Argentina.
Appendix
Appendix
Proof of Theorem 1
Let \({\mathscr {G}}\) be as in (22), \(\varvec{\varXi }_o=\varvec{\varSigma }_{{{\textbf {x}}}{{\textbf {x}}}}\), \(\varvec{\varGamma }_o=\varvec{\varSigma }_{{{\textbf {y}}}{{\textbf {y}}}}\) and the sets
The function \(g_P({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}},\varvec{\varXi },\varvec{\varGamma ,\sigma })= E_P\left( \frac{{{\textbf {A}}}\varvec{\varXi }^{-1/2}{{\textbf {x}}}-{{\textbf {B}}}\varvec{\varGamma }^{-1/2}{{\textbf {y}}}}{\sigma }\right) \) is continuous in \({\mathscr {G}}\) for \(P=F\). Given \(\epsilon >0\), by using Theorem 1 in Adrover and Donato (2015), we get that there exist \(\eta >0\) and \(0<\tilde{\eta }<\delta \)
Then, if \(({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}},\varvec{\varXi },\varvec{\varGamma })\in E_{\eta }\), there exists \(n_0(\epsilon )\) such that the M-scale \({\hat{\sigma }}\in [\sigma _o-\epsilon ,\sigma _o+\epsilon ]\) for all \(n>n_o(\epsilon )\). If \(({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}},\varvec{\varXi },\varvec{\varGamma })\notin E_{\eta }\), there exists \(\eta >0\) and \(M>0\) such that for all \(n>n_1(\epsilon )\) and \(F_{n}\in {\mathscr {F}}_{n}\), it holds that
Consequently, we can conclude that the SM- estimators in (15) belong to a closed bounded set. Moreover, \(\lim _{n\rightarrow \infty }{\hat{\sigma }}=\sigma _o\). Therefore, the Fisher consistency given in Theorem 1 in Adrover and Donato (2015) let us conclude that any convergent subsequent should converge to \(({{\textbf {A}}}_o,{{\textbf {B}}}_o,{{\textbf {a}}}_o)\) and the consistency follows.
The following technical lemma is needed to prove the asymptotic normality.
Lemma A.1
Let \({{\textbf {z}}}_{1},\dots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}}^{m}\) from an elliptical distribution F with density (18), location parameter \(\varvec{\mu }_{0}\) and dispersion parameter \(\varvec{\varSigma }_{0}\). Suppose that conditions C0-C14 hold. Let \({\mathscr {G}}\) be as in (22). Let \(\phi _{1}\) and \(\phi _{2}\) be defined as in (26). Then, there exists a function \(\tilde{ \theta }:{\mathscr {H}}\rightarrow {\mathscr {G}}\) and a bounded set \({\mathscr {C}}\subset {\mathscr {G}}\) such that \({\tilde{\theta }}_{o}\) is an interior point of \({\tilde{\theta }}\left( {\mathscr {C}}\right) \) and the sets \({\mathscr {F}}_{1i} =\left\{ \phi _{1i}\left( {\textbf {z,}}{\tilde{\theta }}\left( \xi \right) \right) :\xi \in {\mathscr {C}}\right\} \), \(i=1,\ldots ,rm\) and \({\mathscr {F}}_{2k} =\left\{ \phi _{2k}\left( {\textbf {z,}}{\tilde{\theta }} \left( \xi \right) \right) :\xi \in {\mathscr {C}}\right\} \), \(k=1,\ldots ,r\) are Euclidean classes with envelopes \(F_{1i},\) \(i=1,\ldots ,m\) and \(F_{2k},\) \( k=1,\ldots ,r\), such that \(E_{F}\left( F_{1i}\right) ^{2} <\infty \) and \( E_{F}\left( F_{2k}\right) ^{2} <\infty \).
Proof of Theorem 2(i)
Given \({\varvec{\tilde{\theta }}}_{o},\) Lemma A.1 says that \(\phi _{1i}\left( {\textbf {z,}}{\varvec{\tilde{\theta }}}_{o}\right) \in {\mathscr {F}} _{1i}\), \(i=1,\ldots ,mr\) and \(\phi _{2k}\left( {\textbf {z,}}{\varvec{\tilde{\theta }}} _{o}\right) \in {\mathscr {F}}_{2k}\), \(k=1,\ldots ,r.\) Moreover, since Theorem 1 ensures the consistency of \(\varvec{{\hat{\theta }}}=\left( \hat{{\textbf {{A}}}} _{SM}^{o},\hat{{\textbf {{B}}}}_{SM}^{o},\hat{{{\textbf {a}}}}_{SM}{} {\textbf {,}}\hat{{\varvec{\varSigma }}}_{{\textbf {xx}}}^{(R)},{\varvec{\hat{\varSigma }}}_{{\textbf {yy}}}^{(R)},{\hat{\sigma }}\right) \) to \({\varvec{\tilde{\theta }}}_{o},\) given \(\varepsilon _{0}>0\) we can find \(n_{0}\) such that for any \(n\ge n_{0}\), it holds that \(P\left( \phi _{1i}\left( {{\textbf {z}}}, \hat{\varvec{\theta }}\right) \in {\mathscr {F}} _{1i},\right. \) \(\phi _{2k}\left( {{\textbf {z}}},{\varvec{\hat{\theta }}}\right) \) \(\in \) \(\left. {\mathscr {F}} _{2k}\right) >1-\varepsilon _{0}\) for all \(i\in \left\{ 1,\ldots ,mr\right\} \) and \(k\in \left\{ 1,\ldots ,r\right\} \). Given \({\mathscr {F}}\) a Euclidean class and \(\delta >0\), set \([\delta ]=\left\{ (f_1,f_2)\in {\mathscr {F}}\times {\mathscr {F}}:\int (f_1-f_2)^2dP<\delta ^2\right\} \). Given a sequence of independent identically distributed random variables \(\xi _1,\dots ,\xi _n\) such that \(\xi _1\sim P\), set
Given \(\varepsilon >0\) and \(\eta >0\), Lemma 2.16 of Pakes and Pollard (1989), C12 and C13 say that there exist \(\delta >0\) and \(n_{1}\in {\mathbb {N}}\) such that, for all \(n\ge n_{1}\), \(\left( \phi _{1j}\left( {{\textbf {z}}},{\varvec{\hat{\theta }}}\right) ,\phi _{1j}\left( {{\textbf {z}}},{\varvec{\tilde{\theta }}}_{o}\right) \right) \in \) \(\left[ \delta \right] \) and \(\underset{n\rightarrow \infty }{\lim \sup }\ \; P\left\{ \sup _{\left[ \delta \right] }\left| \nu _{n}\left( \phi _{1j}\left( \cdot ,\varvec{{\hat{\theta }}} \right) \right) -\nu _{n}\left( \phi _{1j}\left( \cdot ,\varvec{{\tilde{\theta }}}_{o}\right) \right) \right| >\eta \right\} <\varepsilon .\) Then, we can conclude that
Since \(\varvec{\varPhi }\left( \varvec{{\tilde{\theta }}}_{o}\right) ={{\textbf {0}}},\) by summing up and subtracting some terms we have
By C14 and (27), it holds that \(-\frac{1}{\sqrt{n}}\nu _{n}\left( \phi \left( \cdot ,\varvec{{\tilde{\theta }}} _{o}\right) \right) {\textbf {=}}\left[ \varvec{\varOmega } +o_{P}\left( 1\right) \right] \left( \varvec{{\hat{\theta }}}_{SM}-\varvec{\theta }_{o}\right) +o_{P}\left( 1/\sqrt{n}\right) ,\) and by the Central Limit Theorem we have that \(\nu _{n}\left( \phi \left( \cdot ,\varvec{{\tilde{\theta }}}_{o}\right) \right) \overset{{\mathscr {D}}}{\rightarrow } N_{r\left( m+1\right) }\left( {\textbf {0,}}{{\textbf {V}}}\right) \). Since \(\varvec{\varOmega } \) is invertible, we obtain that \(\sqrt{n}\left( \varvec{{\hat{\theta }}}_{SM}-\varvec{\theta }_{o}\right) \overset{{\mathscr {D}}}{\rightarrow } N_{r\left( m+1\right) } \left( {{\textbf {0}}},\varvec{\varOmega } ^{-1}{{\textbf {V}}}_{o}\left( \varvec{\varOmega }^{-1}\right) ^{t}\right) .\) Straightforward computation let us conclude the explicit form of the asymptotic dispersion matrix and the proof follows. (ii) follows closely from (i). (iii) follows from the fact that \(\varvec{{\hat{v}}}_{SM,k}\) and \(\varvec{{\hat{w}}}_{SM,k}\) are subvectors of \(\varvec{{\hat{\theta }}}_{SM}^{*}\) given in (ii). (iv) gives a simpler form of the asymptotic covariance matrix obtained in (iii) in case of having a non-singular matrix of derivatives \(\frac{\partial \varPhi _2\left( \varvec{\theta }\right) }{\partial {{\textbf {a}}}}\left( \varvec{{\tilde{\theta }}_o}\right) \). \(\square \)
Proof of Theorem 4
Given \(F_{\varepsilon }=\left( 1-\varepsilon \right) F+\varepsilon \delta _{{{\textbf {z}}}_{0}}\), we have to look for the SM-functionals defined as \(g\left( {{\textbf {D}}},{\varvec{\tilde{\varSigma }}}_{\varepsilon },{{\textbf {a}}},\sigma _{\varepsilon }\right) =E_{F_{\varepsilon }}\chi \left( \frac{\left\| {{\textbf {D}}}{\varvec{\tilde{\varSigma }}}_{\varepsilon }{} {\textbf {z-a}}\right\| ^{2} }{\sigma _{\varepsilon }\left( {{\textbf {D}}},{{\textbf {a}}}\right) }\right) =\delta .\) Then, we look for a restricted minimum \({{\textbf {D}}}\in {\mathscr {O}}_{r,m}\), \( {{\textbf {t}}}_{1},\ldots ,{{\textbf {t}}}_{r}\) are the rows of \({{\textbf {D}}}\) and the Lagrangian can be expressed as
where \({{\textbf {t}}}_{1,\varepsilon },\ldots ,{{\textbf {t}}}_{r,\varepsilon },{{\textbf {a}}} _{\varepsilon }\) are critical points for L. The proof follows closely to that of Theorem 1 in Croux and Ruiz-Gazen (2005). \(\square \)
Proof of Lemma 3
The eigenvectors \(\left( 1,1\right) \) and \(\left( 1,-1\right) \) of \({\varvec{\tilde{\varSigma }}}\) correspond to the eigenvalues \(\left( (1-b)/(1+b)\right) ^{1/2} \) and \(\left( (1+b)/(1-b)\right) ^{1/2}\), respectively. Then, the quadratic forms in both definitions coincide and \({\hat{\lambda }}=b^{*}.\) \(\square \)
Proof of Lemma 4
(i) and (ii) are easily derived. (iii) \({\hat{\lambda }}=1\) implies that \(s(1)\le s(\lambda )\) for all \(\lambda \in \left[ -1,1\right] \). Let \(q(\lambda )\) be as in (35). Since \(0=\lim _{s\rightarrow \infty }E\chi \left( q(\lambda )/s\right) \le \delta ,\) this implies that \(s(\lambda )<\infty \ \)and \(s(1)<\infty \). Thus, \(\delta \ge \lim _{\lambda \rightarrow 1^{-}}E\chi \left( q\left( \lambda \right) /s(\lambda )\right) \) and \(P(U=V)\ge 1-\delta \) which says that \(P\left( X=aY+b\right) \ge 1-\delta \), with \(a=S(F_{X})/S(F_{Y})\) and \(b=\) \(-aT(F_{Y})+T(F_{X})\). In case that \({\hat{\lambda }}=-1\), we get \(P(U=-V)\ge 1-\delta \), and \(P\left( X=cY+d\right) \) \(\ge 1-\delta \), with \(c=-S(F_{X})/S(F_{Y})\) and \(d=S(F_{X})T(F_{Y})/S(F_{Y})+T(F_{X})\). Thus, (iii) is proved. (iv) In case of having (X, Y) elliptically distributed with correlation \(\rho ,\) Lemma 3 let us affirm that \(\hat{ \lambda }=\rho .\) \(\square \)
Proof of Lemma 5
(i) is easily derived and (iii) follows as in Lemma 4 (iii). (ii) Since \((U\pm V)^2\ge 0\), then \(-(U^2+V^2)\le UV\le (U^2+V^2)\) and \(|2UV/(U^2+V^2)|\le 1\). Therefore, \(\text {med}(2UV/(U^2+V^2))\in [-1,1]\). (iv) In case of having (X, Y) elliptically distributed with correlation \(\rho ,\) (U, V) is elliptically distributed with density \(f(u,v)=1/Kf_0((u^2+2\rho uv+v^{2})/\sqrt{1-\rho ^{2}}))\) and \(K=\pi (1-\rho ^2)^{-1/2}(F_0( \infty )-F_0(0))\) with \(F_0\) a primitive of \(f_0\). To see that \(P_{c}=P\left( 2UV/(U^{2}+V^{2})\le \rho \right) =0.5\), we perform some change of variables to get that
Using spherical coordinates, \(1+\rho \cos 2\theta \ge 0\) and the fact that \( \frac{\cos 2\theta +\rho }{1+\rho \cos 2\theta }\le \rho \) if and only if and \(\cos 2\theta \le 0,\) then we have
which shows that \({\hat{\lambda }}=\rho \) and the result follows. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Adrover, J.G., Donato, S.M. Aspects of robust canonical correlation analysis, principal components and association. TEST 32, 623–650 (2023). https://doi.org/10.1007/s11749-023-00846-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-023-00846-1