Advertisement

Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

The Gaussian rank correlation estimator: robustness properties

Abstract

The Gaussian rank correlation equals the usual correlation coefficient computed from the normal scores of the data. Although its influence function is unbounded, it still has attractive robustness properties. In particular, its breakdown point is above 12%. Moreover, the estimator is consistent and asymptotically efficient at the normal distribution. The correlation matrix obtained from pairwise Gaussian rank correlations is always positive semidefinite, and very easy to compute, also in high dimensions. We compare the properties of the Gaussian rank correlation with the popular Kendall and Spearman correlation measures. A simulation study confirms the good efficiency and robustness properties of the Gaussian rank correlation. In the empirical application, we show how it can be used for multivariate outlier detection based on robust principal component analysis.

This is a preview of subscription content, log in to check access.

References

  1. Alqallaf, F.A., Konis, K.P., Martin, R.D., Zamar, R.H.: Scalable robust covariance and correlation estimates for data mining. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton (2002)

  2. Alqallaf, F., Van Aelst, S., Yohai, V., Zamar, R.: Propagation of outliers in multivariate data. Ann. Stat. 37, 311–331 (2009)

  3. Atkinson, A.C., Riani, M., Cerioli, A.: Exploring Multivariate Data with the Forward Search. Springer, Berlin (2004)

  4. Bernholt, T., Fischer, P.: The complexity of computing the MCD-estimator. Theor. Comput. Sci. 326, 383–398 (2004)

  5. Branco, J.A., Croux, C., Filzmoser, P., Oliveira, M.R.: Robust canonical correlations: A comparative study. Comput. Stat. 20, 203–229 (2005)

  6. Capéraà, P., Guillem, A.I.G.: Taux de resistance des tests de rang d’indépendance. Can. J. Stat. 25, 113–124 (1997)

  7. Christensen, D.: Fast algorithms for the calculation of Kendall’s τ. Comput. Stat. 20, 51–62 (2005)

  8. Critchley, F., Schyns, M., Haesbroeck, G.: A relaxed approach to combinatorial problems in robustness and diagnostics. Stat. Comput. 20, 99–115 (2010)

  9. Croux, C., Dehon, C.: Influence functions of the Spearman and Kendall correlation measures. Stat. Methods Appl. 19, 497–515 (2010)

  10. Croux, C., Haesbroeck, G.: Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87, 603–618 (2000)

  11. Daudin, J.J., Duby, C., Trecourt, P.: Stability of principal component analysis studied by the bootstrap method. Statistics 19, 241–258 (1988)

  12. Davies, P., Gather, U.: Breakdown and groups (with discussion). Ann. Stat. 33, 977–1035 (2005)

  13. Devlin, S., Gnanadesikan, R., Kettering, J.: Robust estimation and outlier detection with correlation coefficients. Biometrika 62, 531–545 (1975)

  14. Dominici, D.E.: The inverse of the cumulative standard normal probability function. Integral Transforms Spec. Funct. 14, 281–292 (2003)

  15. Filzmoser, P., Fritz, H., Kalcher, K.: pcaPP: Robust PCA by Projection Pursuit. R package version 1.9 (2010)

  16. Grize, Y.: Robustheitseigenschaften von Korrelations-schätzungen. Ph. D. thesis, ETH Zürich (1978)

  17. Hájek, J., Sidak, Z.: Theory of Rank Tests. Academic Press, New York (1967)

  18. Hubert, M., Rousseeuw, P., Vanden Branden, K.: ROBPCA: a new approach to robust principal component analysis. Technometrics 47, 64–79 (2005)

  19. Iman, R., Conover, W.: A distribution-free approach to inducing rank correlation among input variables. Commun. Stat., Simul. Comput. 11, 311–334 (1982)

  20. Kendall, M.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)

  21. Khan, J., Van Aelst, S., Zamar, R.: Robust linear model selection based on least angle regression. J. Am. Stat. Assoc. 480, 1289–1299 (2007)

  22. Maronna, R., Zamar, R.: Robust estimates of location and dispersion of high-dimensional datasets. Technometrics 44, 307–317 (2002)

  23. Maronna, R.A., Martin, R.D., Yohai, V.J.: Robust Statistics: Theory and Methods. Wiley, Chichester (2006)

  24. Rousseeuw, P., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)

  25. Rousseeuw, P., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Verbeke, T., Maechler, M.: Robustbase: Basic Robust Statistics. R package version 0.5-0-1 (2009)

  26. Spearman, C.: General intelligence objectively determined and measured. Am. J. Psychol. 15, 201–293 (1904)

  27. Van Aelst, S., Vandervieren, E., Willems, G.: Robust principal component analysis based on pairwise correlation estimators. In: Proceedings of the 19th International Conference on Computational Statistics, Paris, pp. 573–580 (2010)

Download references

Author information

Correspondence to Christophe Croux.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Boudt, K., Cornelissen, J. & Croux, C. The Gaussian rank correlation estimator: robustness properties. Stat Comput 22, 471–483 (2012). https://doi.org/10.1007/s11222-011-9237-0

Download citation

Keywords

  • Breakdown
  • Correlation
  • Efficiency
  • Robustness
  • Van der Waerden